“Urban-Satellite” estimates in the ABCD Study: Linking Neuroimaging and Mental Health to Satellite Imagery Measurements of Macro Environmental Factors

While numerous studies over the last decade have highlighted the important influence of environmental factors on mental health, globally applicable data on physical surroundings are still limited. Access to such data and the possibility to link them to epidemiological studies is critical to unlocking the relationship of environment, brain and behaviour and promoting positive future mental health outcomes. The Adolescent Brain Cognitive Development (ABCD) Study is the largest ongoing longitudinal and observational study exploring brain development and child health among children from 21 sites across the United States. Here we describe the linking of the ABCD study data with satellite-based “Urban-Satellite” (UrbanSat) variables consisting of 11 satellite-data derived environmental indicators associated with each subject’s residential address at their baseline visit, including land cover and land use, nighttime lights, and population characteristics. We present these UrbanSat variables and provide a review of the current literature that links environmental indicators with mental health, as well as key aspects that must be considered when using satellite data for mental health research. We also highlight and discuss significant links of the satellite data variables to the default mode network clustering coefficient and cognition. This comprehensive dataset provides the foundation for large-scale environmental epidemiology research.


Socio-economic and demographic characteristics
Besides capturing Earth's physical properties, remotely sensed products also map human activity and population distribution by combining of satellite data with socio-economic information (e.g., census data), to infer pixel-level demographic properties, exemplified by products such as the Gridded Population of the World (GPWv4.11)[28], Global Rural Urban Mapping Project (GRUMPv1) [29], LandScan Global Population Database (Landscan Global) [30], WorldPop [31], among others.

Sample description
Address 1 was treated as the primary address where participants spent the most time.Secondary and tertiary addresses were also available if the percentage of time spent in primary address was less than 80%.

Data analysis
To unify the various data sources of the Urban-Satellite variables, we aggregate each input source into new raster files with identical extent, pixel size, and pixel locations.The aggregated rasters are built with a custom Python script employing the Geopandas and Rasterio libraries.Output pixel locations are based on a regular grid with 30 arc-second spacing (approximately 1km) covering the 48 contiguous US states from 124.8° W to 66.9° W and 24.4° N to 49.4° N.For each cell in this grid, we find every input pixel whose center lies within the cell bounds, then calculate the sum, mean, or percent value for each product as specified.
Input data for each dataset were obtained for the year 2017 to align with the baseline ABCD Study visit timing (October 2016 thru October 2018) and comprised LULC (Figure 2 and Table 2), NTL and population data (Figure 3) and spectral indices (Figure 4).

Land Use and Land Cover (LULC)
The land use/ land cover data from 2017,derived from the Copernicus Global Land Service [32] categorizes urban build-up, forest, crop, grass, and water areas using satellite-based discrete classifications at 100m resolution globally (Figure 2) .Urban build-up, forest, crop, grass, and water areas are identified using the Copernicus satellite-based discrete classifications (Table 2).This classification assigns a single category to each pixel at 100m resolution, globally.The forest product linked to the ABCD Study includes twelve different forest classifications.Each of the following classes: grassland, cropland, built-up area, and water corresponds to one Copernicus classification.Each aggregated raster for these categories then signifies the percent of pixels in each grid cell that belong to each classification.Figure 2 in the main paper shows a sample of these land use and land cover products at the national level.
Seasonal water data are also derived from the Copernicus Global Land Service.Unlike the classification-based data above, the source seasonal water data from Copernicus is depicted as a percentage of each 100 m input pixel.Therefore, the input 100 m pixels from Copernicus signifies both a classification belonging to one of the groups above and partial seasonal water coverage.For the seasonal water product linked to the ABCD Study, each output pixel indicates the overall percent of seasonal water coverage.
Data from the Copernicus Global Land Service are obtained as separate 20° by 20° tiles that are mosaiced into a single raster covering the 48 contiguous US states1 .

Nighttime Lights
The nighttime light data sourced from the Earth Observation Group (EOG) Annual VNL V2 product [33] provide average monthly radiance at about 500m resolution, aggregated to sum annual radiance values within each 1km output pixel.These data provide an average monthly radiance at an original resolution of 15 arc-seconds (approximately 500 m).The VNL 2 data are based on VIIRS satellite observations and include filtering for clouds, removal of fires, and background isolation.Our aggregated nighttime light product provides the sum of annual nighttime light radiance values within each 1 km output pixel (see Figure 3a).

Population
Population data from 2017 are based on WorldPop Population Counts [34], specifically the US unconstrained top-down 100 m resolution dataset.These data take population census counts and use other geospatial data to disaggregate census tract information into 100 m by 100 m pixels.Our aggregated population raster sums the WorldPop populations within each 1 km pixel.Figure 3b presents the population datasets at the national and regional levels.

Spectral Indices
We calculated three different spectral indices using 2017 Sentinel-2 Multispectral Instrument Level-1C data accessed through Google Earth Engine (GEE).These indices include the Normalized Difference Vegetation Index (NDVI), Normalized Difference Water Index (NDWI), and Normalized Difference Builtup Index (NDBI) (Figure 4).We calculate the NDWI [36] to monitor surface water bodies.NDWI is calculated using the Green (Green -Band 3) and Near Infrared (NIR-Band 8A) bands of Sentinel-2 satellite image collections.NDWI values range from -1 to 1.Following [37], who finds that NDWI values over 0.3 are indicative of permanent water bodies, our aggregated product presents the percentage of each pixel with NDWI over 0.  [38,39].As demonstrated in previous studies [40], the transition from vegetated to built-up areas often occurs with NDBI values just below 0. Figure 4 clearly demonstrates that the highest values of NDBI are present in relatively barren areas such as observed in the western US.
Because built-up land cover is represented by neither the high nor low end of the NDBI spectrum, one cannot use a single threshold value to represent built-up areas as with the approach to NDBI or NDVI.Our aggregated product therefore presents the average NDBI value within each 1 km pixel.
Although NDBI does not allow for a strict threshold indicating built-up land area, NDBI values do provide details to help identify the relative concentration of built-up area.For instance, where vegetated areas and water bodies are mixed within urban areas, built-up areas are represented by relatively higher NDBI values.Figure 4 shows, as an illustration, the aggregated NDBI product alongside the percent of forest cover in and around Washington DC, USA.Within the city, NDBI tends to be relatively higher in heavily built-up areas downtown than the NDBI observed around forests, parks, and rivers.
It should be noted that the Sentinel-2 input data for these spectral indices contain some notable discontinuities at the borders between satellite passes and is the result of differences in surface and/or atmospheric conditions during individual passes of the satellite.These patterns may carry over into the NDVI, NDWI, and NDBI rasters used here.Users should remain conscious of this when comparing values between widely separated geographic locations.Furthermore, some areas of the Sentinel-2 input are also missing data in a few isolated areas scattered around the country, due to processes for masking out cloudy pixels.These areas of missing data are on the order of 100 km2 in size.These manifest as isolated no-data cells within our spectral indices products.

UrbanSat characterization and association with behavioral, cognition and brain function in the ABCD Study
The UrbanSat data in ABCD release 5.0 include 11 variables for 3 addresses collected at baseline: Address 1 (primary address) for 11226 participants, Address 2 for 1617 participants and Adress 3 for 472 participants.Each variable indicates different aspects of regional environment profiles.Through the histogram plots of these indicators associated with the primary address (Supplemental Figure 1 We selected one variable for each category from the full comprehensive assessments available in the ABCD Study from the baseline visit when the children were 9-10 years old and tested their associations with UrbanSat exposures.Child Behavior Checklist (CBCL) is a parent-rating component of the Achenbach System of Empirically Based Assessment (ASEBA) that detects behavioral and emotional problems in children and adolescents [41].The assessment consists of 119 items describing childhood behavior corresponding to eight syndrome scales (anxious/depressed, withdrawn/depressed, somatic complaints, social problems, thought problems, attention problems, rule-breaking, aggressive behavior).The total problem count (t-score standardized, meanSTD: 45.85 11.34, range: 24-83) was used as the representative behavioral measure in our analyses.The NIH Toolbox® cognition battery consist of seven different tasks covering episodic memory, executive function, attention, working memory, processing speed, and language abilities [42].These measures form three composite scores: a total score composite, a crystalized intelligence composite and a fluid Intelligence composite.We selected the total score composite (raw, uncorrected, meanSTD: 86.22 9.14, range: 44-117) to represent overall cognition in our analyses.We also confirmed that the T-score distribution did not show significant skew from symmetry.

Resting state functional MRI data
For brain function, we utilized resting state functional MRI data with satisfying quality recommended by ABCD Consortium and applied the neuromark full automated spatially constrained independent component analysis framework [43] to extract 53 robust intrinsic connectivity networks (ICNs) using the Neuromark_fmri_1.0 template.The Neuromark framework leverages an adaptive-ICA technique that automates the estimation of comparable brain markers across subjects, datasets, and studies.A set of component templates (see supplemental Table 1 and supplemental Figure 2) were used as references to guide the estimation of single-scan components for the ABCD data.These component templates were created via a unified ICA pipeline.They were constructed using the resting-state fMRI data with large samples of healthy subjects from the human connectome project (HCP) and the genomics superstruct project (GSP).
The HCP data Include 823 subjects' scans and the GSP data include 1,005 subjects' scans that passed the data QC.High model order (order = 100) group ICA was performed on each data respectively, and then the independent components (ICs) from two data were matched by examining the similarity of their spatial maps (Smith et al., 2009).The IC pairs are considered consistent and reproducible across GSP and HCP data if their spatial correlation ≥ 0.4.A correlation value ≥ 0.25 has been shown to represent a significant correspondence (p < 0.005, corrected) between components, and here we used a higher threshold because we would like to capture more reliable and consistent ICs as the templates.The matched IC pairs were labeled as meaningful component templates or noise components by five experts in the ICA field based on expectations that ICNs should exhibit peak activations in grey matter, low spatial overlap with known vascular, ventricular, motion, and susceptibility artifacts, and should have TCs dominated by lowfrequency fluctuations inspecting the locations of the peak activations of their spatial maps and the lowfrequency fluctuations of their TCs.The less noisy ICs from the GSP data were chosen as the component templates for the estimation of the single-scan components and TCs.More details of the Neuromark framework and its implementation can be found at [43].
In brief, these ICNs were organized into seven functional domains according to their anatomical locations and functional information, including subcortical, auditory, visual, sensorimotor, cognitive control, default mode, and cerebellar domains.Functional network connectivity (FNC) was computed as Pearson correlation between time courses of ICNs.Treating ICNs as nodes and FNC as weighted edges, we can represent the brain as a connected graph.In graph theory, the local clustering coefficient is a measure that quantifies the degree a node is close to its neighbors in the form of closed triplet graph.Its value ranges from 0 to 1.When a node has a high clustering coefficient, it tends to form a dense cluster with its neighbors, whereas a node with a low clustering coefficient is loosely connected to its neighbors.
For a weighted graph, we implemented a clustering coefficient defined in Equation 1 proposed by Onnela, et al [44].Focusing on default mode network (DMN), it has seven ICNs as shown in Figure 5  where Ki is the degree of node I,   is the weighted edge between node i and node j.

Data Analysis
For the association analyses of UrbanSat indicators we utilized data from independent participants by keeping one sibling per family, to control for the impact of family structure on the results.We also filtered out data with missing values in social economic status (SES), which resulted in a dataset of 8,949 participants.
We tested the correlations between UrbanSat indicators with SES indicators (household income and parental education level).Household income is to measure the total combined family income for the past 12 months, coded as 1 to 10 from less than US $5,000 to $200,000 and greater.Parental education was coded as 0 to 21 from never attended/kindergarten only to doctoral degree.If both parents' education levels are available, the averaged value is used.
NDVI is calculated at the level of the pixel based on the Near Infrared (NIR-Band 8A) and Red (RED-Band 4) bands of Sentinel-2 satellite image collections.As a normalized difference index, NDVI values range from -1 to 1, with higher values indicating generally denser, thicker, and greener vegetation.NDVI values depend on specific vegetation types, but in general, values below 0.2 indicate bare land, values from 0.2 to 0.5 indicate sparse vegetation, and values over 0.5 indicate dense, leafy vegetation [35].We therefore use the threshold of 0.2 as an indicator of vegetation, and our aggregated raster indicates the percentage of each pixel with NDVI over 0.2.

Supplemental Figure 1 :
), we demonstrated their unique data characters with remarked different shapes of distribution.In parallel, these indicators have also shown varying levels of relatedness.We present Pearson correlation across all indicators in Figure 5 (left), which lists correlations ranging from -0.84 to 0.78.The significant interrelationship among variables is clearly visible.For instance, forest land use is most significantly negatively related to built-up land use, and most significantly positively related to NDVI, both sharing more than 50% of variance.It also relates to NDBI, nighttime lights, and population with strong correlations (absolute r>0.4).In contrast, NDWI relates to other indicators with moderate effect sizes except for permanent inland water.Histogram of 11 UrbanSat indicators.The UrbanSat dataset consists of 11 variables from a total of 11,232 participants associated with three addresses.These indicators are built-up land use, crop land use, forest land use, normalized difference built-up index (NDBI), normalized difference vegetation index (NDVI), normalized difference water index (NDWI), nighttime lights, population, permanent inland water and seasonal water area.After removing the missing values associated with the primary address, the final number of participants was 11,006 for the histogram distribution and cross correlation of UrbanSat indicators of the primary address 4.3.1 Behavior and cognition

3 𝑗𝑘
(right), including anterior cingulate cortex, posterior cingulate cortex, precuneus.We computed the average clustering coefficient of seven ICNs and used it as a representation of brain function in our UrbanSat association analyses.  _ = ∑ (      ) 1/ 3. We calculate NDBI from the Sentinel-2 Shortwave Infrared (SWIR-Band 11) and Near Infrared (NIR-Band 8A) bands.NDBI values range between -1 and +1.In general, the lowest NDBI values represent water bodies and vegetation, intermediate values often indicate buildings (or built up land cover), and the highest values represent open or barren areas

Table S1 .
Peak Coordinates of Components Supplemental Figure 2. Component templates constructed by the Neuromark framework.Spatial maps of the 53 IC templates are arranged into 7 functional networks according to the anatomic and functional prior knowledge.Component templates are thresholded at |t|>10, where one-sample t-statistics have been computed across the single-subject spatial maps.Sagittal, coronal, and axial slices are shown at the maximal t-statistic for clusters larger than 3 cm 3 .
Table S2 lists out the correlation results between UrbanSat indicators with education and household income using 8,716 samples.The associations between UrbanSat indicators and behavior, cognition, and brain function were tested using linear mixed effect models.The dependent variable was the CBCL total problem, the cognitive total score composite, or DMN clustering coefficient, separately.The independent predictors included UrbanSat indicators (tested individually), age, sex, with or without household income and parental education as fixed effects, and the collection site as a random effect.Bonferroni correction was applied for multiple comparison correction for 11 UrbanSat indicators.