Introduction

As upcoming data scientists, we are taught about spatial resolution when working with geospatial data. Essentially, this is the spatial scale which geospatial data is aggregated. These could be set grid cells of 1km x 1km or other administrative boundaries like census tract and city neighborhoods. Each unit of this spatial resolution represents a sector of space, one that environmental health related studies try and relate environmental risk factors and health outcomes.

Environmental health studies combine these data from different sources, each containing different spatial scales. This spatial mismatch has been documented as a severe limitation in environmental epidemiology (Sheppard et al. 2012). For example, environmental data may be aggregated on a fine, 100 meter grid, while socioeconomic data could be grouped together at the census tract level and health data at the neighborhood level. When researchers combine these data layers, the statistical relationships between them can change or disappear entirely, simply because of how the data resolutions were reshaped.

This is the problem Shafran-Nathan et al. (2017) discusses and will be our focus for this blog’s case study: ecological bias in environmental health studies. The way we choose to aggregate data, the polygons and resolutions we use, can alter the conclusions about what communities face environmental health risks. Understanding the nuances in this bias is crucial to environmental health studies as they inform policy decisions that affect real communities.

Case Study Description

The researchers in Shafran-Nathan et al. (2017) examined the Tel Aviv, Israel metropolitan area by analyzing the relationship between nitrogen oxide (\(NO_x\)) concentrations and socioeconomic status (SES). They created \(NO_x\) concentration maps at different grid resolutions: 50m, 200m, 500m, and 1000m. For SES data, they also used four different spatial representations: Municipal borders, Census tracts, Ordered grids (OGR), and Clustered grids (CGR).

When using traditional administrative boundaries like municipal borders and census tracts, the researchers found no significant correlation between \(NO_x\) levels and SES rankings. However, when they used grid-based spatial representations (OGR, CGR) that matched the air pollution data resolution, significant correlations resulted. This change in significance emerges from the mismatch of large SES polygons that are combined with fine-resolution environmental data. The pollution values get averaged over large areas which eliminates the spatial variance and would reveal that there is no relationship between SES and pollution exposure.

All environmental health studies that combine different data layers with different spatial resolutions face this fundamental challenge. If the data is aggregated incorrectly during the data processing stage, any real relationship could be altered or weakened. Researchers must be aware of these decisions on how to represent and aggregate spatial data to make accurate conclusions in their studies.

Reflection

The problem shown in this analysis is also known as the modifiable areal unit problem (MAUP). It is well documented that the MAUP introduces statistical biases that significantly impact hypothesis testing in environmental health studies (Scott and Cutter 1998; Swift, Liu, and Uber 2008; Parenteau and Sawada 2011; Shafran-Nathan et al. 2017).

In class, we read a small introduction on the MAUP in Scott and Cutter (1998). This paper was about the social implications of GIS technologies but can be applied to all spatial analyses. The authors mention how the ideal scale for environmental equity studies are at the individual level, taking each person’s data and correlating the data to social characteristics (Scott and Cutter 1998). However, this is not possible so researchers must use aggregated population data such as at the census tract level. Additionally, this results in social bias where dense population areas tend to have smaller aggregation units and less dense areas have larger aggregation units, skewing results.

In unit 4, we learned about the difficulties in science-policy communication, specifically in biodiversity conservation (Young et al. 2014). The authors propose a framework on how to bridge the gap between science and policy, mainly by improving dialogue and creating a domain in between the two fields that promotes better decision making. With the MAUP, the effects on science and significant results can greatly affect government policy if the biases are not well-accounted for. In addition to the science-policy communication barrier, it is all the more important that the science in not only clear and accurate, but also effectively communicated to create equitable environmental policies.

Solutions & Conclusion

Researchers can reduce biases by matching the spatial resolutions of their data layers. Instead of defaulting to available administrative units like census tracts, statistical relationships are improved by creating custom spatial grids that match the resolution of other variables in the study. This was also done in Swift et al., 2008 where MAUP bias was estimated to be reduced by 41% (Swift, Liu, and Uber 2008). Both studies compare the usage of multiple spatial scales rather than relying on a single aggregation method of pre-defined boundaries. By testing how results vary across different spatial representations, researchers can provide more accurate assessments on how spatial aggregation choices introduce uncertainty in results.

This case study highlights how research decisions on aggregating data can result in different conclusions in environmental health studies. From Shafran-Nathan et al. (2017), mismatches in spatial scale can obscure real environmental inequities leading to inaccurate results that can have direct implications on policy makers. For studies to correctly inform policy, researchers must evaluate different spatial representations, multiple scales, and communicate the uncertainty introduced by their methods. This will ultimately strength the validity of individual studies in their role in science-policy decision making and create more environmentally just communities.

References

Parenteau, Marie-Pierre, and Michael C Sawada. 2011. “The Modifiable Areal Unit Problem (MAUP) in the Relationship Between Exposure to NO2 and Respiratory Health.” International Journal of Health Geographics 10 (1): 58. https://doi.org/10.1186/1476-072X-10-58.

Scott, M, and S Cutter. 1998. “GIS and Environmental Equity: An Analysis of the Assumptions.” NCGIA Initiative 19 – GIS and Society: The Social Implications of How People, Space, and Environment Are Represented in GIS, 1–4.

Shafran-Nathan, Rakefet, Ilan Levy, Noam Levin, and David M. Broday. 2017. “Ecological Bias in Environmental Health Studies: The Problem of Aggregation of Multiple Data Sources.” Air Quality, Atmosphere & Health 10 (4): 411–20. https://doi.org/10.1007/s11869-016-0436-x.

Sheppard, Lianne, Richard T. Burnett, Adam A. Szpiro, Sun-Young Kim, Michael Jerrett, C Arden Pope, and Bert Brunekreef. 2012. “Confounding and Exposure Measurement Error in Air Pollution Epidemiology.” Air Quality, Atmosphere & Health 5 (2): 203–16. https://doi.org/10.1007/s11869-011-0140-9.

Swift, Andrew, Lin Liu, and James Uber. 2008. “Reducing MAUP Bias of Correlation Statistics Between Water Quality and GI Illness.” Computers, Environment and Urban Systems 32 (2): 134–48. https://doi.org/10.1016/j.compenvurbsys.2008.01.002.

Young, Juliette C., Kerry A. Waylen, Simo Sarkki, Steve Albon, Ian Bainbridge, Estelle Balian, James Davidson, et al. 2014. “Improving the Science-Policy Dialogue to Meet the Challenges of Biodiversity Conservation: Having Conversations Rather Than Talking at One-Another.” Biodiversity and Conservation 23 (2): 387–404. https://doi.org/10.1007/s10531-013-0607-0.