How machine learning and “computer vision” will transform our cities
In 1969, William H. “Holly” Whyte decided to analyze, and eventually decode, New York City’s rambunctious street life. A famed author, Whyte, along with a handful of collaborators, was recruited by the city’s planning commission to set up cameras and surreptitiously track human activity.
Whyte and his team spent countless afternoons filming parks, plazas, and crosswalks, and even more time counting, crossing out, analyzing, and quantifying footage. Notations were made for how people met and shook hands. Pedestrian movement was mapped on pads of graph paper. To get accurate assessments of activity at a street corner, Whyte’s researchers manually screened people caught waiting for lights to change. Imagine how much time it took to figure out that at the garden of St. Bartholomew’s Church, the average density at lunch time is 12 to 14 people per 1,000 square feet.
Observe a city street corner, crosswalk, or plaza long enough, and eventually, energy and entropy give way to understanding. The public greeted Whyte’s work with curiosity and amusement. “One thing he has discovered is where people schmooze,” deadpanned a 1974 New York Times article. “The other thing he has discovered is that they like it.”
Whyte’s Street Life Project was a revelation. Whyte offered nuggets not of gold, but of actionable data, which helped shape city policy: peak versus off-peak activity, average densities, walking patterns. Called “one of America’s most influential observers of the city,” Whyte’s insights and hard-earned wisdom informed New York’s 1969 city plan, helped revise its zoning code, and turned once-squalid Bryant Park into a prized public space.
What’s inspiring and a little mind-boggling about Whyte’s process is that until relatively recently, planners still practiced that type of time-consuming manual observation. Infrared cameras and other technologies have been around for years to make data-gathering easier. But often, going beyond surveys, personal observations, and educated guesses required hand counts and film study.
With smartphones in our pockets, and smart city technology increasingly embraced by local leaders, it may seem like we’re already awash in a flood of urban data. But that’s a drizzle next to the oncoming downpour that may radically transform our understanding of cities and how they function. Two rapidly rising technologies—computer vision and machine learning—offer the potential to revolutionize understanding of urban life.
“The ability to transmit images into data, without human intervention, is the single most powerful thing,” says Rohit Aggarwala, chief policy officer at Sidewalk Labs, the Google urban technology spinoff that is building its own smart neighborhood in Toronto.
With the advent of ever-cheaper cameras, computer vision analysis to turn images into data, and machine learning to turn data into patterns, predictions, and plans, suddenly every city is on the verge of being able to do what William H. Whyte did, without the staff. Technological advancement seems guaranteed: In 2016 alone, venture capital firms invested half a billion dollars in computer vision companies, while estimated global spending on machine learning ranged between $4.8 billion and $7.2 billion. The Cities of Data project at New York University expects the urban science and informatics field to grow to a $2.5 billion enterprise by 2030, and at the Consumer Electronics Show in Las Vegas earlier this month, more vendors self-identified as peddling “smart city” tech than gaming or drones. As a younger generation of digitally native city planners steps into office, seeing automation and autonomous vehicles on the horizon, the hunger and sense of urgency for improving municipal technology has never been greater.
Combine that with eye tracking—already widely used in the retail environment—and the data transmitted by ever-present smartphones, and suddenly, the ability to make relevant, even reactive, urban spaces, informed by real-time information, isn’t science fiction. Planners will use all that data to ask questions, and make decisions, about people, says Justin Hollander, a professor at Tufts University who runs the Urban Attitudes Lab and explores the intersection of design and technology. Human-centered design, as pioneered by urbanists such as Jan Gehl, will enter a new phase. It will threaten traditionally analog methods of design, turning planning into more of a science.
“When I worked as an urban planner, we did the best that we could to shape buildings, streets, and sidewalks to meet environmental and economic development goals,” says Hollander. “But we never got into the head of the people who used these spaces.”
To understand the potential of computer vision and machine learning, head to the mall. Marketers, already cashing in on the data and purchasing habits we reveal online, want to use these new technologies in stores to make the lucrative connection between online and real-world behavior.
Retailers have long been tracking customers for security purposes, but improvements in computer vision, facial recognition, and machine learning have spurred attempts at data mining and data analysis, according to Joan Insel, a vice president at CallisonRTKL, a global design and architecture consulting firm. Many companies are making significant investments in the nascent technology, she says—Walmart, which is pursuing cashierless checkout through its Project Kepler initiative, patented a facial-recognition system that tracks consumer mood and alerts store associates to assist those deemed unhappy—but few want to discuss plans due to privacy concerns.
“In retail today, it’s not about segmentation, it’s personal targeting,” she says. “Everyone is trying to figure out the secret sauce.”
Standard Cognition, a San Francisco-based machine vision startup, has some ideas about what that secret sauce might be. The company’s autonomous checkout system, which relies on a series of overhead cameras, makes it possible to shop sans cashier or standard check-out. But while the high-tech dream of shopping without a cashier is attention-getting, the company is courting dozens of offers from top retail brands and raising millions in funding because they traffic in a different kind of data.
“Imagine having a photographic memory and recalling everything that happens in the store,” says co-founder Michael Suswal.
Standard Cognition can do things other systems can’t offer. It can do inventory counting with near perfect precision, and even provide staffers with the most efficient route to return missing products to the shelf. But more importantly, the technology tracks intent and activity: what people pick up and read without purchasing, and even what they look at from across the store. Standard Cognition can follow shoppers in real time, across different cameras and from multiple perspectives simultaneously. (The NBA uses a similar system to track players, Suswal says, but it cheats somewhat by using jersey numbers as a reference.)
The company plans on opening a demo location in early 2018, with an assortment of products to help “teach” the machine-learning system to identify different items and behaviors. It’s joining a wave of investment in stores without staffs, including Chinese online retailer JD.com, which plans to open hundreds of “unmanned” convenience stores. Standard Cognition’s next move, Suswal says, is security. Just like employees will “teach” the system to recognize the difference between fruits and vegetables, feeding it images and information to analyze and make connections, they also can educate the machine on the difference between a wallet and a gun. Think of it like teaching a child, Suswal says.
And, as with a child, many privacy experts fear the technology could unintentionally pick up biases and prejudices from its role models. An October 2016 report by the Georgetown Law Center on Privacy & Technology, The Perpetual Line-Up, found that national law-enforcement networks using facial-recognition technology include photos for half of all adults in the United States, and the technology was most likely to make mistakes on women, young children, and African Americans, “precisely the communities on which the technology is most likely to be used.”
Nor will there be clear ways to avoid interacting with this technology. It “will become cheaper and cheaper as machine learning gets better,” Suswal says. “It’ll become ubiquitous. There won’t be any place you go that doesn’t have computers watching. It’ll actually be more private, since only the machines will be watching.”
Taken one step further, analysis of intent can be used to reshape the built environment.
WeWork, the coworking giant with more than 200 global locations, utilizes video analysis and machine learning, along with many other data sources, including heat maps and the company’s app, to create better workspaces. Daniel Davis, the lead researcher on the company’s R&D team, sees this technology creating an endless feedback loop.
“The architecture industry has always been based on intuition and reverence for individual geniuses,” he says. “What’s shifting today is the understanding that we can test and evaluate design solutions. Whether it’s designing traffic plans and counting cars, or analyzing how a meeting room is used, there’s suddenly all this information that wasn’t obvious, or even accessible, a few years ago.”
WeWork’s level of vertical integration—the same company designs, remodels, and operates the space—explains why they’ve embraced this technology in ways that standard architecture firms haven’t. As owners, they can react to the data and fix areas that are underperforming, a luxury available to few other designers. They can also anticipate user needs: By feeding data through machine-learning algorithms, they can predict how much a particular proposed meeting room will be used before it’s even built.
Capturing intent, and then creating a circular relationship between designing and building—analysis, design, evaluation, then redesign—suggests how this technology can lead to more human-focused design and urban planning.
Historically, Davis says, computational design referred to the work of architects such as Frank Gehry and Zaha Hadid—twisting, wildly creative buildings that owe part of their existence to processing power. In the future, computational design will refer to something more human-scale.
“If you walk into a WeWork space, it doesn’t look like that kind of high-tech space, it looks homely, inviting, welcoming,” says Davis. “Nothing suggests a sophisticated process behind it. The space is performing in a way that suits your needs, without overtly being a space of the digital.”
Urban planning ostensibly has the same goals as Davis’s team: How do you make shared space work better? As Sidewalk Labs’ Aggarwala says, it’s all about designing in the service of improvements that aren’t primarily digital.
As Aggarwala’s company begins outreach, planning, and eventually design for its smart city project in Toronto, the most high-profile effort to build a neighborhood “from the internet up,” he says one of the guiding factors is designing a natural space for pedestrians. Crossings should feel safe. Pavement with embedded LED lights could change color based on changing uses, offering subtle cues. Dynamic wayfinding and signage, which showed directions to coffee shops in the morning, will switch in the evening to highlight nearby restaurants and bars. Adaptive traffic signals will recognize pedestrians, cyclists, and transit vehicles at intersections to improve safety, and an autonomous shuttle might ferry residents across the neighborhood. He wants to design something so interactive and understanding that people will put down their phones.
“If it feels tech-forward, we’ve probably done a few things wrong,” he says. “This is going to be a real neighborhood. It’s got to be usable.
“There’s a huge gap between WeWork and those who manage public realm,” he says, but the principle of usability still applies to urban planning. And even without research departments, park services could benefit from the information these new tools gather.
He used a neighborhood park as an example. As the surrounding area’s demographics change, and perhaps the bulk of the younger population ages from toddlers to teens, playspace can be redesigned to make way for more skateparks and basketball courts. Park amenities wouldn’t just address the community as it was during the last build-out or renovation. Over a multiyear time frame, public space would evolve as the neighborhood does.
This obviously isn’t revolutionary in terms of technology. But if changes to the city’s physical landscape are made in parallel with changes in usage and demographics, it would represent a shift in urban planning, policy, and budgeting. A team of researchers from MIT, Harvard, and the National Bureau of Economic Research published a study that used years of Google Street View imagery and machine learning to identify the physical improvements that increased perceptions of neighborhood safety over time (including population density, a higher proportion of college-educated adults, and a higher proportion of Hispanic residents living in the neighborhood). Apply those findings in reverse, and cities could track neighborhoods and adjust to future safety issues before they become serious problems.
“Now, we’ll finally be able to adjust capital plans and budgets with actual data,” he says. “You can win arguments because you have the numbers. In the past, it was just about doing what’s been done in the past, because that’s safe. Nobody could attack you for that, until now.”
Scaling these technologies to the city level, and blanketing an entire neighborhood with cameras and sensors (what Aggarwala and others have described as a “digital layer”), requires extensive infrastructure spending and bandwidth costs. Toronto has the advantage of Sidewalk Labs funding development and data collection—and the privacy concerns that come with a private company gathering unprecedented amounts of information about the public. Numina, a Brooklyn-based startup, wants to make this technology more entry-level and less intrusive.
The company recently designed its own monitor and camera—housed in a PVC pipe, it looks a bit like a cup dispenser—that can be affixed to any utility pole or street sign. A low-cost solution, which recently won an award as one of the top five most promising technologies at the international Smart City Expo World Congress, it’s already been installed in four U.S. cities, with three more regions in the planning stages.
Tara Pham, one of the co-founders, started the company with a personal vendetta: She and her co-founder were both hit by vehicles while riding bikes in 2013, and decided they needed to build a better bicycle and pedestrian counter to give more cities access to data-based decision-making.
At that time, to get real transportation data, planners would pick up a clipboard and a clicker and get counting. Four years later, the all-purpose, easy-to-install, solar-powered public realm measurement tool their company created gives cities actionable intelligence without surveillance. The camera takes pictures three times a second and processes images on the device, only delivering anonymous data about object classes, such as bikes or pedestrians, instead of storing and sharing photos with identifiable faces. The system can track street usage; measure new safety metrics, such as near-misses; and even has the potential to trigger city services, such as garbage pick-up, a capability being tested in pilots this spring.
“Since we started 18 months ago, machine learning and computer vision have become pretty democratized,” Pham says. “The processing power we can build into the sensor to process images in the device is significantly cheaper than it’s ever been.”
In Numina’s short existence, it’s already helped cities start that data-design feedback loop. In Jacksonville, Florida, a city with one of the highest pedestrian fatality rates in the U.S., Numina sensors were set up at a dozen intersections. At one site, near a bus station, constant monitoring discovered that, amid the bustle of passengers arriving and boarding, there was one pathway pedestrians repeatedly took to jaywalk. The city thought it might need to redesign the entire intersection. Instead, data showed the most quick and effective fix was creating a mid-block crossing with $30 worth of paint.
Today’s new eyes on the street may lead to a cityscape much like the one we use today, just more efficient, accessible, and, ultimately, human. Ann Sussman, an architect and researcher, has spent years using eye-tracking software to look at how humans react to architecture and urban design. She believes that technology can understand human intent and even subconscious desires, leading to an “age of biology” in design.
“When I went to architecture school, there was no mention of cognitive science,” she says. “Today, I can measure people’s instantaneous reaction to layouts, and measure their hormone levels when I make changes to a building facade.”
Sussman has been focused on figuring out how humans react to architecture on a more unconscious level. By staging photo comparisons, and tracking minute facial reactions, she’s gained a better understanding of the kinds of design that make us happy: active and busy fenestration patterns, like the ones found in Paris and Boston, engage viewers. Symmetry, like the canals of Amsterdam, calms, while large, blank facades, like those found on some Brutalist buildings such as Boston’s City Hall, confuse, since they don’t offer more information when viewers get closer, an innate expectation of our reptile brains.
Hollander, who collaborates with Sussman, has taken this line of experimentation and inquiry even further, with experiments that tested the health and well-being of people in certain areas and neighborhoods.
In 2016, New York City wanted to find out how certain design features impacted community health, and hired Hollander to look at about a dozen different public buildings around the city, including museums, libraries, and community health centers. Using an array of biometrics, including electroencephalograms (EEGs), to measure brain activity and facial analysis, he tracked whether certain improvements and renovations made any difference in how people felt about the buildings. The city has not shared how it has acted on Hollander’s findings, but he argues this type of analysis adds another layer of information to public decision-making that, until recently, wasn’t available.
Sussman’s and Hollander’s research suggests another possibility of the more reactive city. It can respond, and be designed, to our actions and intents. Can it even react to our unconscious? Perhaps, as MIT professor Carlo Ratti suggests, we can use this kind of analysis to create an architecture that adapts to human need, instead of the other way around.
“Architecture has often been described as a kind of ‘third skin’ in addition to our own biological one and our clothing,” he says. “However, for too long it has functioned more like a corset: a rigid and uncompromising addition to our body. I think that new digital technologies and distributed intelligence have the potential to transform it.”
The groundwork for smart cities is already being laid. In addition to tech companies such as Sidewalk Labs and Panasonic investing in test neighborhoods, large telecom companies in the United States, including Comcast and Verizon, are investing in the creation of new, large-scale Internet of Things networks, which will make connected devices more prevalent and citywide deployment more affordable.
The technology’s power and rapid adoption has lead civil-liberties groups to warn of its potential for upending ideas of privacy, security, and even government surveillance powers. Last month, the Georgetown Law Center on Privacy & Technology found video and machine-learning systems were being used at airports. And more and more law enforcement agencies plan on upgrading and adopting this technology. Ekin, an international company that sells patrol cars with facial recognition technology, “has extensive plans to expand to the U.S. in 2018,” according to a spokesperson. It launched sales on January 8, offering American law enforcement a suite of products, including Ekin Face, a face-detection and -analysis system that makes sure “suspicious, guilty, or wanted people even the human eye can miss can be detected, and appropriate actions will be taken in time.”
The proponents of smart city tech agree that privacy and anonymity can’t be sacrificed for data. Sidewalk Labs’s Aggarwala says that while targeted ads can be useful (he does work for a spinoff of Alphabet, the parent company of Google), there is absolutely no way these systems should be designed to “hoover up data” or be used for fishing expeditions. In a recent Reddit AMA, Dan Doctoroff, head of Sidewalk Labs, said “privacy should be designed into the very foundation of the tools we develop.”
Aggarwala argues that systems being developed now would actually have much more potential privacy protection than the video projects of William H. Whyte: All they need for analysis is a figure’s outline, which can provide information without compromising anybody’s individual identification. Planners and designers can still create cities and spaces that feel “like any other place, but better,” without violating privacy.
As the physical world becomes more digital, we will find ourselves facing the same issues exploring the sidewalks as we do using a web browser: What’s the right balance between privacy and convenience, or personalization and surveillance?
“What’s creepy is that the technology is comprehensive,” Aggarwala says.
It’s a question of power and perspective. William Whyte, for all his skilled observations, was ultimately a bystander with better memory. New technology is creating not just an observer, but an omniscient narrator.
“Think about ants,” says Standard Cognition’s Suswal. “From a human perspective, we can look down on an ant colony and predict what’s going to happen. But ants, they just crawl around over each other. Right now, we’re the ants. These cameras, with predictive analytics, will allow us to see things much differently.”
Editor: Sara Polsky
Powered by WPeMatico