An alert comes in over the scanner: A burglary’s taking place on the corner of 16th and Washington streets, and police rush to the scene. They get there just in time to apprehend the thief, who is coming out of the house when they arrive. They put him in cuffs and drive him away in the squad car.
It’s a scene that happens every day and that’s played out in virtually every police procedural drama on TV. But what if that scene played out another way — powered by data and technology?
Take two: An alert comes in of a burglary that may occur on the corner of 16th and Washington streets, and police drive to the scene. They get there long before the thief, who sees the police and slinks away empty-handed, burglary averted. The police again check the computer, which gives them an alert of another possible crime yet to occur, and speed off to a new location to prevent it.
The second scenario may seem like science fiction — perhaps reminiscent of the premise of tech noir Philip K. Dick story-turned-Tom-Cruise-vehicle Minority Report, in which psychic “precogs” identify “precrimes.” But the scenario is playing out in cities across the country, where people are using machine-learning techniques to accurately predict lawbreaking and drive down crime rates.
Searching for the Patterns in Data
“What we are really trying to do is find patterns in the data,” says Professor Michael D. Porter, who holds a joint appointment at the University of Virginia’s School of Engineering and Applied Science and Darden School of Business. Porter is at the forefront of using big data for prediction of everything from Yelp reviews to cyberattacks to terrorism.
Police departments have long used data to track crimes, most famously in the Compstat approach to policing pioneered in New York City in the mid-1990s, in which geolocated crime statistics were used to identify crime “hotspots.”
Porter’s approach takes that tactic to the next level by using historic data to identify hotspots of the future. “Let’s say we had three car thefts, on the same street, in one week,” Porter says. “Past crime patterns might suggest that all streets within five blocks might not only be a hotspot for more car thefts, but also a hotspot for burglaries.” The method uses a principle in data science known as contagion, in which a phenomenon spreads and grows, just like an infectious disease.
In 2017, Porter used that principle to develop a model with George Mohler, a computer scientist at Indiana University–Purdue University Indianapolis, that they entered into a crime forecasting competition run by the National Institute of Justice. The contest used data of nearly 1 million crime calls in Portland, Oregon, between 2012 and 2017 and asked contestants to predict hotspots for future periods ranging between one week and three months.
Porter and Mohler’s team won nine categories and tied for first overall.
Mohler has used similar models to create a commercial software called PredPol, which is currently being used by the City of Los Angeles and dozens of smaller cities from Santa Cruz, California, to Decatur, Georgia, to inform where they send their patrols.
Unique and Persistent Patterns
Porter studied for his Ph.D. at UVA’s engineering school under Don Brown, W.S. Calcott Professor and founding director of UVA’s Data Science Institute.
Inspired by Brown’s techniques, Porter used an analysis technique to identify perpetrators of serial crimes, such as a string of burglaries. “You want to find out if a collection of crimes was committed by the same offender or possibly a group of co-offenders,” Porter says.
To do that, he looked for patterns in elements of the crimes that were both unique and persistent. “If you are looking for burglary, and you find someone went through an unlocked door, that’s not very helpful, but if you have two crimes in which someone goes through the roof, that’s unique,” says Porter. He proved the accuracy of the technique by analyzing five years of burglaries in Baltimore County, Maryland, resulting in a 91 percent accuracy.
“Offenders may not be making conscious choices to say, ‘I am only going to neighborhoods that have trees and don’t have streetlights.’ They may just see a place and say, ‘This place looks good,’” Porter says. “But if you collect enough data, you can get into the mind of the offenders and discover things they may not know about themselves.”
It’s these kind of predictions that create uneasiness about data prediction being a form of Minority Report-esque precrime. The difference, Porter says, is causality. “We aren’t saying we know what people will do, and we are going to go out and intercept them,” he says. Rather, the data models create predictions of the probability that crimes may occur. “We are saying that there is an elevated risk of crime at certain locations and times; police can use this information to better schedule patrols.”
Accounting for Bias in Data and Predictive Policing
There is a legitimate criticism that predictive policing can lead to racial profiling, if the neighborhoods identified by the model tend to be those with a certain racial or ethnic demographic. If police then carry out more arrests due to racial bias, those arrests could feed into the model and influence future predictions.
One solution to the problem is to routinely allocate patrols to other areas in the city with different ethnic makeups, to make sure that discrimination doesn’t infect the model. A more sophisticated solution would be to find a way that the model itself can identify when arrests are due to racial bias, and which fit into larger patterns of crime, so the model itself can become self-correcting.
Applications for Predictive Models Across Industries
Policing isn’t the only area in which Porter and other researchers have used data contagion to predict future events. In other work, Porter and colleagues have applied similar techniques to acts of terrorism in the Middle East. In a paper currently under review, Porter found a relationship between the size of a terrorist attack and the short-term risk of another attack. Attacks in Iraq and Israel that killed a large number of people were followed by a reduction in the risk of future attacks; in Afghanistan, however, he found no such effect.
Porter has also applied his techniques to cybersecurity, creating a model to predict whether employees might cause a future security breach. Looking at factors such as an individual’s morality, beliefs and attitudes, he was able to predict with 85 percent accuracy whether an employee would become a security risk or not.
Porter foresees a wide range of possible applications for these machine-learning techniques.
“What I am really interested in is understanding human patterns and behavior,” he says. “Humans are complex, but we have strong patterns as well — in how we speak, how we engage in social media, even how we carry out crime. By understanding these patterns and how we generate them, we can find ways to benefit society.”