Methodology: How we collected and analyzed the data

Date published: June 3, 2020

EDF and our partners collected and analyzed the air quality data and developed an interactive map that displays on-road air pollution, population characteristics, health conditions, pollution sources and other environmental health information.

Data collection

EDF partnered with Google Earth Outreach, Rice University and Sonoma Technology to outfit two Google Street View cars with mobile sensing equipment to collect and analyze the data. Detailed information about data collection and analytical methods is available in this recently published article in Environmental Science & Technology. This Houston study covers a much larger geographic area and improves upon data collection and analytical methods compared to our first monitoring campaign in Oakland, CA.

Sonoma Technology equipped the cars with fast-response, laboratory-grade instruments to precisely measure pollution concentrations every few seconds. We gathered data for 9 months (between July 2017 and March 2018), mostly weekdays, typically between 7am and 4pm, with some limited early morning, late evening and weekend drive periods. The cars repeatedly measured pollution on every street and highway in 35 census tracts in the Houston area (over a 30 square mile area covering 800 total road miles). We chose census tracts based on varying population characteristics, health conditions, and types and quantities of emission sources. Some of these census tracts have existing regulatory air quality monitors; however, many do not.

The research team designed the daily driving plan to ensure we systematically sampled each neighborhood at different times of the day, week and year to minimize sampling biases. We aimed to sample areas using three priority levels for the number of drive passes ranging from 15 to 44 drive passes of each location. We analyzed PN concentrations and also collected CO₂ measurements. In total, the cars collected 30 million unique measurements, representing the largest mobile air pollution data set of its kind in the U.S. to date.

Vehicles and instrumentation

The research team deployed two custom-modified, gas-powered Google Street View cars equipped with fast-response laboratory-grade instruments to measure black carbon (BC), particle numbers (PN), nitrogen oxide (NO) and nitrogen dioxide (NO₂). The instruments included: an aethalometer for BC, a water-based condensation particle counter for PN, chemiluminescence analyzer for NO, and a cavity-attenuated phase shift spectrometer for NO₂. In addition, the mobile platform included a GPS unit and an on-board data management system. We measured additional pollutants and are conducting further analysis. These include PM2.5, ozone (O₃) and carbon dioxide (CO₂). The data underwent quality assurance and quality control processes for valid operating conditions. The team ensured data quality through periodic calibrations and noise quantification.

Data analysis

To generate the pollution maps, scientists at Environmental Defense Fund and Rice University used a refined algorithm for aggregating and summarizing the repeated measurements made on each street, based on Messier et al. (2018) research methodology. In our study, we aggregated repeated measurements to compute the mean concentration of each drive pass through a road segment (drive pass means), followed by the median of these drive pass means over a unique 4-hour sampling period (drive period median).

The drive pass mean aggregation reduces the influence of oversampling extremely high concentrations at one location due to being stopped in traffic for a long period of time and/or close to a high-emitting vehicle tailpipe. This method allows us to estimate the the "typical" level of pollution over a 9-month period at each location via the median of drive period median concentrations. Over the sampling time period, some locations show consistently higher median pollution levels. These patterns arise from major pollution patterns (e.g., traffic, industries wind and other atmospheric processes).

The main updates in this study include improvements to geolocation of mobile measurements and statistical approaches for quantifying elevated pollution levels. Mobile measurements were assigned to road segments at various lengths (50, 90m) corresponding to the response time of the air pollution instrumentation used. Dead-ends were not included in the road segment assignment due to self-sampling of the Google Street View car's exhaust while turning around. Confidence intervals of median concentration were estimated at each road segment, accounting for atmospheric variability, sampling errors associated with pollution changes over time and instrument drifts. The researchers required a minimum number of measurements at a road location to ensure that we yield a representative picture of pollution levels with sufficiently low uncertainties relative to spatial pollutant patterns. This limited the analyses to 22 census tracts that have between 15-44 repeated drive visits.

Hotspot analysis

The previous Oakland study used a semi-quantitative approach to identify hotspots or locations where pollution level is considered elevated. (In that work, hotspots were defined as locations where concentrations of multiple pollutants exceed nearby ambient levels by 50% or more.) This study applied a statistical approach to hotspot identification and quantification. Researchers compared the median concentration of each road segment with the median concentration of all road segments across the analysis area (22 census tracts). A road segment's pollution level is considered elevated if its median concentration is higher than that of the city-wide average and their confidence bounds do not overlap with each other. Applying the non-overlapping confidence bounds criteria means we can be confident that pollution level at a particular road segment is truly elevated.

In addition to the median concentration for each road segment, the researchers also calculated other summary statistics which further our understanding of intermittent and high extreme pollution behaviors and patterns. These additional results are included in the downloadable data set. We show the 90th percentile summary statistics for BC on the interactive map.

Mobile air pollution data is available for download from the Air Quality Data Commons (AQDC). The AQDC provides centralized, open, data storage and enables analysis and visualization of neighborhood-level air quality data.

Interactive map

The interactive map displays mobile air pollution data from this study collected between July 2017 and March 2018, as well as data on population characteristics, health conditions, pollution sources and other environmental health information from the Houston-Galveston–Brazoria EnviroScreen Tool developed by scientists at EDF and Texas A&M University (and published in this paper) using existing data sets.

Pollution concentrations

EDF chose the air pollution concentration scales shown to display the full range of surface street median concentrations measured. Per Miller et al 2020, the concentration increments represented by the shades of color in the color-bar are based upon the BC instrument precision at 3500 ng m⁻³; for NO, NO₂ and PN they are based on the 95% confidence interval of the highest (90^th percentile) median concentration across the domain. For BC and NO, the first step on the concentration scale (palest yellow) denotes the instrument's minimum detection limit (MDL). The MDL for BC and NO are 1.6µg/m³ and 4ppbv, respectively. BC and NO concentrations below MDL are reported as MDL/2 per standard conversion. The MDL for NO₂ and PN are 0.95ppbv and 450 particles/cm³, respectively. All NO₂ and PN concentrations reported were above MDL. The map displays air pollution concentrations at 50-meter road segments on surface streets and 90-meter road segments on highways. The concentration level at each road segment is based on aggregating all 1- to 5-second measurements assigned to that location over repeated drive visits. For NO, NO₂ and PN, we show the median of drive period median concentration at each road segment--representing the typical pollution condition at that location. For BC, we show the 90th percentile of drive period median concentration, because ~75% of measurements were below the minimum detection limit of the BC instrument, and the median of drive period median BC concentration could not be derived.

Pollution sources

Pollution sources data layers show a subset of industrial facilities that are important sources of air pollution in the Greater Houston area. While the presence of these emission sources do not necessarily indicate areas of higher exposure levels, they provide contextual information about potential environmental health risks to residents living nearby.

We include metal recycling and concrete processing plants, petrochemical facilities and power plants. Many of these facilities are magnet sources for heavy-duty and diesel-fueled vehicles. Some on-site pollution sources (e.g. diesel operating equipment, concrete/metal processing) at these facilities could also be detected via on-road air monitoring. These emissions sources have both localized air pollution impact and contribute to regional air quality conditions. We have included Environmental Protection Agency (EPA) Risk Management Plan facilities as an indicator for presence of high environmental risk. These are industrial facilities that use large amounts of extremely hazardous substances and are federally required to file risk management plans.

Data sources

Metal recycling plant locations from the City of Houston Health Department.
Concrete processing plant locations compiled from New Source Review Air Permits (Texas Commission for Environmental Quality) and a list of operating plants by the Houston Chronicle. Duplicates and older renewal permits were removed and locations and years of operation were verified using Google Earth Imagery.
Petrochemical facility location data extracted from the 2015 U.S. EPA Toxic Release Inventory (TRI) program database) (NAICS codes 324 and 325).
Power plant locations from U.S. Energy Information Administration.
EPA risk management plan facility locations accessed from the Houston Chronicle's Right-to-Know Network Database.

Vulnerable people and places

We also display sensitive locations including schools, childcare centers and hospitals, as people who frequent these locations tend to be more susceptible to the adverse impact of air pollution and other environmental hazards.

Our map also includes the Social Vulnerability Index developed by the Centers for Disease Control and Prevention. The index quantifies the ability of communities to respond to, or withstand, external stress factors that impact human health. The index takes into account socioeconomic and demographic factors such as income levels, educational attainment, race and ethnicity, household composition, access to housing and mobility, and others. The index is expressed as a nationwide percentile, with higher values indicating greater vulnerability.

We display population density information, as well as population proportions of children under 5 years old and individuals over 65. These population groups are known to be more susceptible to the impacts of air pollution. We also display population proportion of non-white residents, as minority populations are known to have multiple underlying vulnerabilities that can impact health outcomes.

Data sources

Schools from Texas Education Agency.
Childcare centers from Texas Department of Health and Human Services.
Hospitals with ERs (include major general hospitals with emergency rooms and large standalone emergency rooms) from Texas Department of State Health Services.
Social vulnerability index (2016, nationwide percentile) from the Centers for Disease Control and Prevention.
Demographic data layers from American Community Survey 2013-2017 5-year estimates.

Health metrics

The map also shows prevalence of asthma, coronary heart disease, chronic obstructive pulmonary disease (COPD), coronary heart disease (CHD) and stroke in adults over the age of 18. These conditions increase susceptibility to adverse impacts of air pollution. In addition to disease prevalence, the map shows average life expectancy, which is another important indicator of the baseline health condition of a community.

Data sources

All data layers: Centers for Disease Control and Prevention's 500 Cities: Local Data for Better Health, 2018 release.