Ecometrics in the Age of Big Data

Daniel Tumminelli O’Brien, Robert J. Sampson & Christopher Winship

Go to article

APA
O’Brien, D. T. , Sampson, R. J. & Winship, C. (1). Ecometrics in the Age of Big Data. Sociological Methodology, 45(1), 101–147. http://dx.doi.org/10.1177/0081175015576601

Keywords
311 hotlines , big data , broken windows , computational social science , Ecometrics , physical disorder , urban sociology

Abstract
The collection of large-scale administrative records in electronic form by many cities provides a new opportunity for the measurement and longitudinal tracking of neighborhood characteristics, but one that will require novel methodologies that convert such data into research-relevant measures. The authors illustrate these challenges by developing measures of “broken windows” from Boston’s constituent relationship management (CRM) system (aka 311 hotline). A 16-month archive of the CRM database contains more than 300,000 address-based requests for city services, many of which reference physical incivilities (e.g., graffiti removal). The authors carry out three ecometric analyses, each building on the previous one. Analysis 1 examines the content of the measure, identifying 28 items that constitute two independent constructs, private neglect and public denigration. Analysis 2 assesses the validity of the measure by using investigator-initiated neighborhood audits to examine the “civic response rate” across neighborhoods. Indicators of civic response were then extracted from the CRM database so that measurement adjustments could be automated. These adjustments were calibrated against measures of litter from the objective audits. Analysis 3 examines the reliability of the composite measure of physical disorder at different spatiotemporal windows, finding that census tracts can be measured at two-month intervals and census block groups at six-month intervals. The final measures are highly detailed, can be tracked longitudinally, and are virtually costless. This framework thus provides an example of how new forms of large-scale administrative data can yield ecometric measurement for urban science while illustrating the methodological challenges that must be addressed.

Main finding
Citizen-initiated administrative databases like Boston’s 311 hotline can provide a nearly costless multidimensional measure of physical disorder, can be more precise than other measures, and can be tracked longitudinally. There are five factors of physical disorders that could be grouped into either the denigration of public space or negligence in private space. The counts of these events per census block group were adjusted for bias in the probability across neighborhoods with reported issues. The hotline database provided data used to create a behavioral model to predict a neighborhood’s “civic response rate.” Audits in the field were cross-referenced with the data to produce adjustment measures for the raw data from each census block group. Criteria for reliability were established through multilevel modeling for one-time measures and cross-time trajectories. The optimal time window for measurement by census tract was two months, while the census block group is six months. The programming code was published alongside the paper to facilitate the reproduction of the measure for similar databases. The recent abundance of this type of big data creates a significant opportunity for researching physical disorder and other ecometrics.

Description of method used in the article
The first analysis involved measuring physical disorder through reports in the Boston constituent relationship management (CRM) system from 2010–2012. Of 178 recorded case types, 28 showing physical disorder were categorized into five factors with two main groupings: denigration of public space, including trash and graffiti, and negligence in private space, including big-building complaints, housing issues, and uncivil use of space. The main measures were counts of events per census block group. The second analysis attempts to account for bias in the first regarding the probability per neighborhood of an issue being reported to the CRM system. A simple behavioral model was created from the 2011 CRM data to predict census block groups' "civic response rate." Concurrently, an audit of 72 of Boston's 156 census tracts was completed in 2011 to identify streetlight outages and the level of street garbage. Boston's Public Works Department also audited the quality of all city sidewalks from 2009–2012. These audits were cross-referenced with CRM reports to predict block groups' likelihood of reporting an issue to the CRM system. This produced an adjustment measure, proportional to the case count, for each block group that was applied to create the final physical disorder measures. The construct validity of these measures was tested against 2005–2009 American Community Survey demographic data, 2008–2010 Boston Neighborhood Survey data on the perceived physical disorder and collective efficacy, and 2011's 911 reports of gun-related incidents. The third and final analysis used multilevel models to assess the prior analyses' consistency over different time intervals, using the intraclass correlation coefficient and the ability to distinguish between neighborhoods using the reliability coefficient statistically.

Verdict
Of some practical use if combined with other research

Organising categories

Activity
Other or N/A
Method
3D / Digital / Datasets Spatial Methods
Discipline
Sociology
Physical types
Sidewalks Streets
Geographic locations