Introduction

Using surrogates of brain activity such as the blood-oxygen-level dependent (BOLD) signal obtained using functional magnetic resonance imaging (fMRI), whole-brain functional networks (i.e., connectomes) can be estimated in vivo. The brain functional connectome is organized at multiple spatial scales, one of which is the mesoscale. It has been recently shown that a full repertoire of functional communities—groups of nodes that are densely connected internally—can be consistently decoded, even at rest. In the brain, these communities are known to represent subsystems and mediate distinct neurophysiological functions (e.g., the brain’s visual subnetwork)1,2,3. Moreover, this scale is highly sensitive to disease, where several psychiatric disorders have shown selective disruption in particular brain communities4,5,6,7,8. This apparent importance has prompted interest in this line of investigation; while a vast wealth of knowledge has been gained from these efforts, several methodological pitfalls remain.

To map the mesoscale architecture of the normal brain, previous studies have generally applied the community detection algorithm on a group-representative network obtained by averaging networks from a group of individuals. However, it is becoming increasingly clear that important features may be lost by such averaging (including some that are present across individuals9), leading to a representation that may not resemble a true central tendency in the group10,11,12,13,14,15.

Further, the analysis of networks derived from time series (i.e., correlation networks) is challenging. First, unlike prototypical networks where edges are either present or absent16, edges in correlation networks represent the magnitude of statistical association, and so are on a continuum. Second, methods that are commonly used to index association—most commonly Pearson correlation—also produces negative values (anticorrelations). Third, these networks contain numerous indirect dependencies by virtue of the transitivity inherent in correlations17,18,19. This is particularly salient in large networks, as these indirect effects may be compounded with higher-order interactions.

Previous investigations have generally made use of thresholds prior to analysis in order to treat functional networks like usual sparse graphs. The rationale is that the strong connections that would be retained would likely be the most relevant and least likely to be artifactual, also leading to the elimination of negative edges. Recently, the relevance of weak network edges has become increasingly recognized, as they appear to convey unique information not encoded in strong edges20,21,22,23,24. Further, the choice of the threshold is also challenging and can give rise to heterogeneity in the findings25,26. Like weak edges, negative edges (i.e., anticorrelations) have been also shown to have a substantial physiological basis27,28,29,30,31.

While several methods exist for community detection in networks, most are not compatible with weighted networks containing both positive and negative links of the type observed in correlation networks32,33. One of the most commonly used and best-performing methods in community detection is the Louvain algorithm33,34,35. The Louvain algorithm is versatile and can be used with different null models, some of which have been extended to allow for positive and negative interactions33. Importantly, null models that are typically used with the Louvain algorithm are based on permutations (rewiring) of the original networks, preserving the total weight and degree while randomizing connections36. This is known as the Newman–Girvan (NG) null model, which has also been extended to signed networks37,38,39. However, such an approach is problematic in the case of correlation matrices, as it assumes that the entries are independent, thereby violating the correlation transitivity40,41. Therefore, the community detection may not be accurate, particularly if there is heterogeneity in the size of the communities40—which is known to be biologically plausible in the case of the brain.

A number of recent developments in network science and neuroimaging have paved the way for frameworks that can be used to address these challenges. Here we used the Louvain algorithm, with a null model based on random matrix theory designed explicitly for correlation networks (we refer to it throughout as the RMT null model)41. While it was initially introduced in finance41, it has been recently successfully applied to neurophysiological recordings42. This method, which does not violate transitivity, is compatible with weighted and signed networks, forgoing the need to perform any thresholding. The introduction of more efficient algorithms, of reclustering frameworks, and the increased availability of high-performance computing, have made it practical to perform the clustering at the level of the individual network. As a neuroimaging dataset, we used a recent release from the Human Connectome Project (S1200 release; March 2017)43. This relatively-large sample of healthy individuals consists of state-of-the-art MRI data and addresses several of the limitations present in older datasets44,45.

We hypothesized that determining the mesoscale architecture of the average human cortex can be better achieved by first mapping the mesoscale in each subject and then arriving at a meaningful central tendency, rather than by mapping the mesoscale of an averaged brain network. To this aim, we perform the clustering at the subject level followed by a reclustering procedure to reach a population consensus mesoscale architecture (we refer to this as subject-derived consensus)46,47. We compare this subject clustering-reclustering framework to the traditional group-representative method (we refer to it as group-derived consensus) to assess which results in solutions that map best to the individuals’ data.

Materials and Methods

Neuroimaging dataset

The data that we use in this work is from the Washington University-Minnesota Consortium Human Connectome Project (HCP)43. The latest release at the time of writing was adopted (S1200; March 2017). Details regarding this dataset have been previously published43,45. Briefly, it included state-of-the-art whole-brain MRI acquisition with structural, functional, and diffusion-weighted imaging; the scanner was a customized Siemens Skyra 3 T scanner with slice-accelerated sequences for fMRI48,49,50,51. Whole-brain functional data were acquired in two sessions. Each session consisted of two phase-encoding directions (left-right and right-left) each a 15 min multiband gradient echo-planar resting-state run (Voxel size = 2.0 × 2.0 × 2.0 mm; TR = 720 ms; TE = 33.1 ms; Flip angle = 52; 72 slices; Bandwidth = 2,290 Hz/pixel; FOV = 208 × 180 mm). Informed consents were obtained from all subjects. The study procedures were approved by the institutional review board at Washington University in St. Louis. All experiments were performed in accordance with relevant guidelines and regulations. Permission was obtained from the HCP to use the Open Access data for the present study, which abides by the Data Use Terms (http://www.humanconnectome.org/data/data-use-terms). The S1200 release includes 1097 subjects with resting-state fMRI scans. Our analysis was restricted to subjects who completed all four resting-state scan runs; namely, two sessions, each with two encoding runs. This resulted in n = 1003 individuals; 534 females and 469 males. The age range was 22 to 37, with a mean of 28.7.

First-level processing

The HCP minimal processing pipeline was used51,52. Briefly, this included projection to the surface space, 2 mm FWHM smoothing, ICA + FIX denoising with minimal high-pass filtering, and surface registration using MSMall53,54,55,56,57,58,59. To define our ROIs, we used a newly-developed multimodal parcellation (MMP)60. We chose this parcellation because it is neuroanatomically informed, with data from cortical architecture, connectivity, and topography. This group parcellation consisted of 360 ROIs (180 per hemisphere) that cover the entirety of the cerebral cortex (but does not include the subcortex or cerebellum), which we mapped to the individuals. In the standard surface space, we calculated the mean time series from all voxels in each ROI. Resting-state fMRI time courses from the left-right and right-left phase-encoding runs and the two sessions were concatenated (total duration 60 minutes; 4800 time point). Functional connections between nodes i and j were defined as the Pearson product-moment correlation of their respective time series.

Workflow overview

The first step consisted of extracting the multiresolution network community structure from each individual subject using a variation of the Louvain algorithm (detailed in sections 2.4 and 2.5). To address the near-degeneracy of the algorithm, we used multiple runs and a partition similarity-based framework to pick a stable output (detailed in section 2.6). The resulting outputs from all individuals were used to populate a co-classification matrix. We then used a hierarchical multiresolution method46 to extract a consensus that is representative of the group (detailed in 2.7).

Community detection

This consisted of using the Louvain community detection algorithm34; specifically, an iterated version as implemented in the GenLouvain package61. This versatile method is based on finding communities with a high degree of intra- and low degree of inter-connectedness. This is achieved heuristically by optimizing the modularity metric Q, which captures the degree of connectedness within communities compared to connectedness expected under a null model34,36:

$$Q=\frac{1}{2m}\,\sum _{ij}\,[{A}_{ij}-{P}_{ij}]\,\delta ({c}_{i},{c}_{j})$$
(1)

where A and P are the observed and null adjacency matrices, respectively; and \(m=\frac{1}{2}\,{\sum }_{ij}\,{A}_{ij}\) is the total strength of the network. The Kronecker \(\delta ({c}_{i},{c}_{j})\) function is equal to 1 when i and j are in the same community and 0 otherwise. As mentioned in the introduction, classical formulation of P is that of a permutation null model36: \({P}_{ij}=\frac{{s}_{i}{s}_{j}}{2m}\) where \({s}_{i}={\sum }_{j}\,{A}_{ij}\) is the strength of node i. Although this definition was extended to weighted signed networks37,38,39, it remains inappropriate for use with networks generated from correlation of time series. Namely because entries in the correlation matrices are not independent, which leads to a bias in community detection17,18,40,41,42. We avoided this by adopting a null model that was specifically designed for correlation-based networks41 (detailed below). The networks were not thresholded, retaining all weights (both positive and negative).

Methods related to modularity maximization are known to suffer from the resolution limit—the inability to detect communities smaller than a certain size62. Several methods have been used to address this32,63,64,65. Here we adopt a recursive hierarchical approach to recover the community structure at multiple scales46,66. After detecting the first-level (i.e., coarsest) communities in the initial run of the algorithm, we define each detected community as a separate subgraph and run the algorithm recursively on each. This is repeated until no more statistically significant subcommunities can be detected. This iterative procedure allows the resolution of a hierarchically nested structure. Indeed, this is consistent with notions of hierarchical organization in the cortex from microscopic tracer studies in other mammals67,68. We use this hierarchical framework at the subject-level, and a slightly different one at the reclustering stages (see below).

Due to the nearly degenerate outputs of the Louvain algorithm69, for each subject-level run, we performed 100 iterations and adopted the partition that is most similar to the ensemble, borrowing the approach from70,71,72. The rationale is that similarity to the ensemble can be used as a surrogate for stability, and the most stable partition solution is likely the closest to the ground truth33,72. To evaluate partition similarity, we used the z-score of the Rand coefficient (described below) and calculated the pairwise similarity between all 100 partition solutions, selecting for each subject the partition with the highest mean similarity to the ensemble. Since the community detection procedure was applied recursively in a hierarchical manner, 100 iterations and subsequent selection of the representative partition based on similarity was also applied to each subcommunity. This was done in order to avoid adopting an arbitrary solution from each subject.

A random matrix null model for correlation networks

To generate null models that are appropriate for use with correlation networks, we adopted a method based on random matrix theory (RMT). It is described in full detail in the original publication41. Briefly, this method builds null networks using a modified Wishart distribution with the same common trend and noise as the observed network, but without the community structure (sum of the random component and the dominant positive component41). This method does not require thresholding and is compatible with negative values. The end result being that positive interactions are maximally concentrated within modules and negative interactions expelled outside.

For the hierarchical recursive application, after identification of a community c in the initial network, we regenerate a null model from time series of nodes belonging to this community, and the community detection is then applied to this subgraph. One criticism of such recursive procedures relates to the fact that there are no clear stopping rules32. Here, it is important to mention that with the RMT method, only network components that stand out from the ones predicted by the community-free null model in a “statistically significant” manner are kept in the filtered networks41,42. Therefore, when the community detection algorithm is ran on the RMT-filtered networks, all detected communities are statistically significant41,42. If no communities are detected, the recursive algorithm stops.

Partition similarity

In order to measure the similarity between two partitions, we adopted the z-score of the Rand coefficient70,71. In the original definition, the Rand coefficient of two partitions is calculated as the ratio of the number of node pairs classified in agreement in both partitions, to the total number of pairs73. Similar to Doron et al.71, Akiki et al.6, Betzel et al.74, Bassett et al.72, here we use the z-score variant introduced in Traud et al.70, which can be interpreted as a measure of how similar two partitions are, beyond what might arise at random33,70,75. An analytical formula for the z-score of the Rand coefficient between partition a and b can be written as:

$${R}_{ab}=\frac{1}{{\sigma }_{{w}_{ab}}}({w}_{ab}-\frac{{M}_{a}{M}_{b}}{M})$$
(2)

where M is the total number of node pairs, Ma and Mb the numbers of node pairs in the same community in a and b respectively, wab the number of pairs that figure in the same community in both a and b, and \({\sigma }_{{w}_{ab}}\) the standard deviation of wab (as in Traud et al.70). For more information, see SI.

Hierarchical consensus reclustering

For the hierarchical consensus reclustering step, we adopted a recently developed method by Jeub et al.46. Our multiresolution co-classification matrices embed information from the different hierarchical levels of community structures detected from the clustering results at the level of the subjects’ networks. The method developed by Jeub et al.46 extends the classical formulation of consensus reclustering47 by allowing a hierarchical multiresolution output and has built-in tests for statistical significance46. Here, the quality function was modularity-like as in Eq. 1, with a “local” variant of the permutation null model. Briefly, in addition to the constraint of the traditional permutation model of a fixed size and number of communities, this local model also assumes that node i is fixed and node j is random when calculating \({P}_{ij}^{local}(\alpha )\). This means that the null network only receives contributions from nodes that are less frequently co-classified together compared to the local permutation model, at a statistical significance level α. More detail can be found in Jeub et al.46.

The same recursive principle and Louvain implementation described above were used here. To ensure a stable output, 100 iterations are used at each level (cases of output instability are dealt by meta-reclustering as in Lancichinetti and Fortunato47, Jeub et al.46); the threshold for statistical significance is set at α = 0.05.

Partition task homogeneity

Task fMRI data from the Human Connectome Project span several domains (social cognition, motor, gambling, working memory, language processing, emotional processing, and relational processing)45,76. Here we used the effect size activation maps over 86 task contrasts (group average over 997 subjects from the S1200 release).

We adopted the framework of module functional homogeneity12,77,78 to gauge the quality of partition solutions across the hierarchical levels as another metric independent of partition similarity. Nodes within well-defined modules should, in principle, respond in agreement across tasks. That is, modules should show a high degree of homogeneity (e.g., uniformly activated or deactivated when completing a certain task). Similar to Schaefer et al.77, Kong et al.78, for each task, we quantify homogeneity by calculating the negative of the standard deviation of nodal activation (Cohen’s d) values for each module (the sign was flipped so that lower, more negative values indicate lower functional homogeneity, while greater, less negative values closer to zero indicate higher homogeneity). Of the 86 task contrasts we excluded redundant ones (e.g., the standard deviation from FacesShapes contrast is identical to the ShapesFaces contrast), retaining 48 contrasts in total.

However, because we were to compare partitions of different size and numbers, this simple definition would not be sufficient79,80,81. For example, modules with a smaller number of nodes may be more homogeneous. To account for this bias, we standardized it against the null distribution obtained by randomly permuting node assignments to modules 10,000 times (keeping the number and size of the communities constant), and expressed the homogeneity as a z-score. For each hierarchy, the z-scores were averaged over all modules and then over all task contrasts. This resulted in a summary measure how functionally homogeneous each partition solution is, beyond what is expected by the size and number of modules.

Analysis of nodal consistency

To quantify how consistently each nodes is assigned to its community, we calculated the nodal consistency, using the hierarchical consensus partitions as a reference. For each consensus partition level, nodal consistency was calculated by counting the number of times (across subjects) a node is co-classified with other nodes that belong to the same community (using the hierarchical consensus partitions as a reference), divided by the total number of times that the node has been classified (i.e., to same or to a different community).

To better understand the variability in consistency across nodes, we calculated the following: 1) the nodal strength as a measure of network hubness calculated as \({s}_{i}={\sum }_{j}\,{A}_{ij}\) as the strength (hubness) of node i, 2) the signal-to-noise ratio defined as \(SN{R}_{i}=\tfrac{{\mu }_{BOL{D}_{i}}}{{\sigma }_{BOL{D}_{i}}}\) of the nodal BOLD time course, and 3) a measure of nodal activity variation across the task fMRI contrasts, here defined as the coefficient of variation \(C{V}_{i}=\tfrac{{\sigma }_{Task{s}_{i}}}{{\mu }_{Task{s}_{i}}}\). For simplicity, these were calculated from group-averaged data (average connectivity matrix for hubness, group-representative time series for the S1200 sample82 for SNR, and group-average task fMRI contrast maps for the task fMRI coefficient of variation). We then used these terms in a multiple linear regression model as predictors of nodal consistency.

Robustness analyses

In-scanner head motion has been reported to confound the estimation of functional connectivity83. For the main analysis, we adopted the strategy proposed by the HCP group51,52 which includes ICA + FIX58,59. We chose not to regress out the “global signal” [mean grey-matter time course regression (MGTR)] as there is concern that the process may distort the correlation structure by shifting the connections to an approximately zero-centered distribution, causing an artifactual increase in computed anticorrelations, and induce a shift in areal boundaries and a distance dependence in functional connections60,84,85,86. The first point is particularly salient in our case as we do not threshold negative connections prior to the community detection analyses. MGTR may therefore spuriously shift (weakly) “positive” corrections into anticorrelations, which would directly impact how they are treated by the RMT-based method, which is based on expelling negative connections outside of modules41. However, it has been argued that the standard HCP denoising methods do not fully remove motion artifacts and that regression of the “global signal” remains an effective strategy in reducing the dependence of correlations on motion27,87,88,89. To assess the robustness of our community detection results, we have repeated the main analysis after incorporating MGTR into the preprocessing pipeline. To avoid the influence of “artifactually” induced anticorrelations on the community detection, we removed negative values from the correlation matrices prior to the RMT decomposition [note: an example of the non-thresholded MGTR approach can be found in the accompanying Supporting Information (SI) document]. To compare the consistency between the partitions obtained with and without MGTR, we correlated the co-classification matrices obtained from the subject-level clustering of the two methods, and, for interpretability, the percentage of nodes that differ in the subject-derived consensus partitions(i.e., the Hamming distance).

In our analysis, we used a multi-modal cortical parcellation60 to define the network nodes. To ensure that this did not lead to idiosyncratic results, we have repeated the main analyses with a cortical parcellation by Shaefer et al.77 which is based solely on function, and may be more functionally coherent (see SI).

Statistical analysis

Statistical tests for community detection are described above. To compare the similarity measures from the subject-derived vs. group-derived consensus, we used two-tailed permutation hypothesis tests (10,000,000 permutations), and Cohen’s d for the effect size. Fisher’s combined probability test was used to calculate a summary measure of the combined results from the individual permutation tests (at each hierarchical level) bearing on the same hypothesis.

Software and code

The analyses were conducted using MATLAB 2017b (MathWorks Inc., MA, USA). The hierarchical consensus framework was adapted from Jeub et al.46 (https://github.com/LJeub/HierarchicalConsensus). We adopted the Louvain implementation from Jeub et al.61 (https://github.com/GenLouvain/GenLouvain). The Random Matrix Theory method was adopted from MacMahon et al.41 (https://mathworks.com/matlabcentral/fileexchange/49011). Miscellaneous network tools were used from the Network Community Toolbox (http://commdetect.weebly.com) and the Brain Connectivity Toolbox90 (https://sites.google.com/site/bctnet/Home). The brain plots were vizualized with the Connectome Workbench software91 (https://github.com/Washington-University/workbench). Our code and hierarchical brain maps are available on GitHub (https://github.com/emergelab/hierarchical-brain-networks).

Results

Mesoscale organization revealed by subject-derived consensus

We first performed the community detection at the level of the individual subjects’ scans. Using the multi-modal parcellation (MMP) atlas (180 cortical areas per hemisphere, excluding subcortical structures)60, regional fMRI time series were used to generate functional networks and corresponding random matrix null models after appropriate preprocessing (see Materials and Methods). These were then used with the Louvain community detection algorithm. To index the full range of spatial resolutions, we applied the algorithm recursively: each daughter community was treated as a new network and the process was repeated until no statistically significant communities were found under the null model. Near-degeneracy of the Louvain algorithm was addressed by considering 100 runs of the algorithm and picking the most stable output before proceeding to the next hierarchical level (see Materials and Methods).

This resulted in a median of 5 hierarchical levels for each subject (range 1–8). The number of communities across all hierarchical levels ranged between 2 and 134. To identify a representative partitioning for the group based on the information from the subject-level partitioning, we adopted a consensus clustering approach33,46,47,72. This method consists of summarizing the outputs of the subject clustering results by quantifying the number of times nodes i and j were assigned to the same partition across subjects and hierarchies and populating a co-classification matrix Cij with these values. This co-classification matrix was then subjected to a recursive clustering algorithm recently introduced for multiresolution consensus reclustering46. This resulted in a subject-derived consensus hierarchical tree with 103 levels, ranging from 3 communities at the first hierarchical split to 112 communities at the finest-grained level (Fig. 1a). Thus, the finest branches of the tree contain about 3 nodes (areas).

Figure 1
figure 1

Subject-derived consensus hierarchical partitioning. (a) Co-classification matrix summarizing the results of the subject-level clustering, sorted by community affiliation. The dendrogram represents the hierarchical organization of the nested communities. The length of the arms of the dendrogram are proportional to the average value of the local null model. The background colors represent the candidate division (see below). (b) Similarity plot showing the mean similarity between the partitioning in each hierarchical level in the dendrogram and the clustering at the subject-level quantified by the z-score of the Rand coefficient (blue), and the average z-scored functional homogeneity (purple; values of z > 1.645 represent values that are significantly more homogeneous than the null model at a one-sided \(\alpha < 0.05\)). The local maximum in similarity corresponds to the partitioning of the cortex into 6 communities (dashed red line). (c) Brain surface plots of the 6 communities corresponding to the local maximum: visual (purple); somatomotor (blue); default mode (green); central executive (red); ventral salience (orange); and dorsal salience (yellow).

By means of the methods that we used, all partition solutions in the group hierarchy were statistically significant under the respective null models (see Materials and Methods). To identify partition solutions in the consensus hierarchy that are most expressed at the level of individual subjects, we calculated the average similarity between each hierarchical consensus partition level and the subject-level partition solutions (first averaged within each subject across hierarchies, then across subjects). This allowed us to gauge the “representativeness” of the different levels, with those corresponding to local maxima considered to be of particular importance (Fig. 1b).

As an additional independent partition quality metric, we used a measure of functional homogeneity derived from task fMRI. Interestingly, homogeneity peaks appeared to coincide with the similarity peaks, suggesting a convergence between a partition’s “representativeness” and its functional homogeneity across tasks (see SI).

To facilitate the interpretability of the modular organization, here we focus on the first local maximum yielding 6 communities (Fig. 1c). At this level, the organization consisted of an occipital community corresponding to the cortical visual system (visual); a community centered around the central fissure and extending to the transverse temporal gyrus, corresponding to the somatosensory, auditory, motor and supplementary cortices (somatomotor); a community with anterior and posterior midline components (medial prefrontal and posterior cingulate cortices) as well as a middle temporal component, collectively known as the default mode; a community with nodes predominantly in the frontoparietal cortex that is more expressed in the right hemisphere (central executive); a cingulo-opercular community spanning the ventral part of the salience system (ventral salience); and a more dorsal community spanning dorsal parts of the salience system (dorsal salience).

Interestingly, other prominent local maxima appear to have occurred at neurobiologically relevant divisions (Fig. 2). The level yielding 11 communities corresponds to the split of the default mode community into a midline core community and middle temporal lobe community (Fig. 2a). The level resulting in 13 communities represents the split of the auditory community from the somatomotor community (Fig. 2b). The level yielding 19 communities represents the delineation of the language community, more expressed on the left (Fig. 2c). The level yielding 30 communities represents a hemispheric split of the central executive community (Fig. 2d).

Figure 2
figure 2

Top: similarity plot showing the mean similarity between the partitioning in each hierarchical level and the clustering at the subject-level quantified by the z-score of the Rand coefficient. The local maxima in similarity are denoted by the dashed red lines. Bottom: emerging communities at local maxima, (a) The level of 11 communities is characterized by splitting of the default mode community into a mainly midline core community (dark green) and mainly middle temporal lobe community (light green), compared to the preceding level. (b) The level of 13 communities is characterized by the splitting of the auditory community (light blue) from the somatomotor community (dark blue). (c) The level of 19 communities is characterized by the emergence of the language community (turquoise) from lateral default mode (light green). (d) The level of 30 communities is characterized by the hemispheric split of the left and right central executive community (red and pink).

Subject-derived consensus partitions are more accurate representation of individuals’ mesoscale features

As a control to the subject-level method, here we perform the community detection on a group-representative network. We concatenated the fMRI regional time series from all participants and generated connectivity and corresponding random matrix null networks. For a meaningful comparison with the subject-level method, we iterated the hierarchical algorithm an equal number of times as the subject-level method (1003 times) and used the multiresolution output from these runs to populate a co-classification matrix. Since the first-level clustering here was performed on a single group-representative network, a slight variability in the outputs of the 1003 runs was deemed necessary to achieve a rich co-classification matrix. For this reason, the procedure to control for the near-degeneracy was applied at the hierarchical consensus clustering only (see Materials and Methods). Each run resulted with a median of 5 hierarchical levels (range 4–5). We then built a co-classification matrix Cij using the outputs and performed the hierarchical consensus reclustering step. This resulted in a group-derived consensus hierarchy of 101 levels, with a number of communities ranging between 4 and 143 (Fig. 3a). See SI for more details.

Figure 3
figure 3

Group-derived consensus hierarchical partitioning. (a) Co-classification matrix summarizing the results of the group-level clustering, sorted by community affiliation. The dendrogram represents the hierarchical organization of the nested communities. The length of the arms of the dendrogram are proportional to the average value of the local null model. The background colors and key are superimposed from the subject-level consensus clustering that yielded 6 modules in Fig. 1a–c. (b) Similarity plot showing the mean similarity between the clustering at the subject-level, and the partitioning in each hierarchical level (number of communities) in the group-derived consensus partitioning (red) and the subject-derived consensus partitioning (blue); similarity is quantified using the z-score of the Rand coefficient.

One of the goals of this study was to assess whether applying the clustering algorithms at the subject level followed by meta-reclustering to reach a group consensus (i.e., the method in the previous section) confers a meaningful advantage over applying the clustering algorithm to a group-representative network as is generally done in the literature. For this reason, we calculated the average similarity between the group average-derived consensus partitions (Fig. 3a) and the partitions at the level of the individual subjects. We compared these values with the similarity between the subject-derived consensus partitions and the partitions at the level of the individual subjects (Fig. 3b). Compared to the group-derived consensus partitions, the subject-derived consensus partitions were more similar to the individuals’ partitioning throughout the 65 hierarchical levels that are in common (two-tailed permutation tests; \(n=1003\); 10−7 < p < 2.073 × 10−4). The mean effect size was \(d=0.5381\pm 0.2477\,{\rm{SD}}\) (Fisher’s combined probability test, p < 10−16). One exception is the partitioning that yielded 4 communities, where the group-derived consensus showed higher similarity to individuals’ partitioning two-tailed permutation test; \(n=1003\); p = 1.8 × 10−5; \(d=0.1905\)). We also conducted this analysis with the normalized mutual information92 instead of the z-score of the Rand coefficient, and obtained similar results (see SI).

Nodal variability in co-classification consistency partially explained by its topological role

Nodal consistency values ranged between 0.5200 and 0.8817 across hierarchies (mean = 0.7609 ± 0.0582 SD) (Fig. 4a). At the level of 6 communities (Fig. 4b), the multiple regression model that included nodal hubness, nodal signal to noise, and task covariation significantly predicted nodal consistency scores (\({R}^{2}=0.07613\), \(p=1.0337\times {10}^{-7}\), \(df=356\)) (Fig. 4b). Nodal hubness was the only term that significantly contributed to the model (standardized estimate = 0.2731, \(p=1.657\times {10}^{7}\)) (Fig. 4c).

Figure 4
figure 4

Nodal consistency. (a) Nodal consistency scores across the hierarchies. Dashed line represents the hierarchy that resulted in 6 communities shown in Fig. 1c, and plotted in (b). (c) Multiple regression model that included nodal hubness, nodal SNR, and task coefficient of variation as predictors of nodal consistency. (d) Breakdown of the predictors and their standardized estimates.

Partitions are robust to prepossessing methods

Subject-derived consensus partitioning after MGTR yielded 115 hierarchical levels, with an organization that is similar to the main analyses (Fig. 5a). The vectorized upper triangles of the co-classification matrix generated from the data with MGTR and without MGTR were strongly correlated (\(r=0.9538\), \(p{\mathrm{ < 10}}^{-307}\), \(df=64618\)) (Fig. 5b). Similarly, the partition similarities were high across all hierarchies as evidenced by the z-score of the Rand coefficient (corresponding to \(p < {10}^{-307}\)), and the percent agreement ranged between 75.56% and 96.11% (mean = 82.20%, SD = 3.49) for the other hierarchical levels (Fig. 5c).

Figure 5
figure 5

Subject-derived consensus hierarchical partitioning after MGTR. (a) Co-classification matrix and dendrogram. Background colors represent the partitioning from Fig. 1. (b) Scatterplot showing the vectorized entries of the co-classification matrices with and without MGTR. (c) Percent consistency between the subject-derived consensus partitions with and without MGTR. (d) Similarity between the consensus partitioning and the clustering at the subject-level (blue), and the average z-scored functional homogeneity (purple; values of z > 1.645 represent values that are significantly more homogeneous than the null model at a one-sided \(\alpha < 0.05\)). Similar to Fig. 1, there is local maximum corresponding to the level of 6 communities. (e) Brain surface plots of the 6 communities after MGTR.

As in the main analyses, the similarity of the consensus hierarchies to the subject-level partitioning as well as the homogeneity peaked at the level of 6 communities (Fig. 5d). This partitioning revealed the same subsystems identified in the main analyses (Fig. 5e), with 93.06% agreement (dashed red line in Fig. 5c).

As an example partition, we also included the partitioning that yielded 22 communities (corresponding to the highest similarity peak) (Fig. 6). At this relatively fine-grained modular organization, there are five visual communities (medial, inferolateral, para, midlateral, superolateral), four somatomotor communities (central, paracentral, inferior, auditory), three ventral salience communities (superior, posterior, inferior) two dorsal salience communities (superior, inferior), four central executive communities (right, left, limbic, dorsal anterior cingulate), and four default mode communities (medial, temporal, limbic, language).

Figure 6
figure 6

Subject-derived consensus hierarchical partitioning at the level yielding 22 communities. This figure corresponds to the Rand global maximum in Fig. 5d. (a) The dendrogram (from Fig. 5a) and background colors highlighting the fractionation of the 6 main communities into the 22 subcommunities. (b) Brain surface plots of the 22 communities.

Discussion

Using subject-level clustering of functional networks and a reclustering framework, we mapped the mesoscale architecture of the cortex at multiple scales. Confirming the study hypothesis, the subject-derived consensus framework yielded results that were more similar to the individual subjects compared to the group-derived consensus, despite the fact that both were obtained from the same initial data. While it is inevitable that any type of average will obscure individual features, obtaining a population-level representation that is as faithful as possible to the mesoscale of individuals is preferable. The subject-level consensus enables a representation that better represents a central tendency, resembling individuals to a greater extent. In previous studies, resting-state fMRI acquisitions typically consisted of few minutes (e.g., 5–10 min; TR = 3000 ms), therefore, averaging may have been a good strategy to increase the signal-to-noise ratio. However, in the current study, the acquisitions consisted of 60 min (TR = 720 ms) potentially mitigating this issue.

The implicit assumption that the individual subjects’ clustering results are an appropriate reference is reinforced by a body of literature showing that subject-per-subject extraction of community organization better agrees with the individuals’ behavioral characteristics78. Further, the convergence of the two independent metrics that we used to assess the partition solutions—similarity and task functional homogeneity—gives credence to the notion that the similarity between the consensus partitions and those at the level of the individual is a relevant quality metric (see SI).

Data science approaches are being increasingly implemented to study the relationship between brain connectivity, behavior, and psychopathology. However, a major challenge in this field is the large number of features compared to the number of observations per study. One important aspect of the multiresolution modular atlas that we developed in this study is that it can be used to reduce the number of features in a neuroanatomically informed manner while accounting for the connectome architecture.

Several important neuroanatomical observations can be made. The representative partition solution that yielded 6 clusters (level 4; Fig. 1c) corresponds to a well-described set of functional systems1,2,93. The visual system can be seen represented in a large occipital community, ranging from early and higher-order visual cortices to the dorsal and ventral streams. Even at the coarsest scale, the visual system comprised a separate community than the somatosensory system—likely, a parallel of the heavy anatomical differentiation in that area and consistent with electrophysiological studies in humans and primates67. The somatomotor community included both early and higher somatosensory and motor cortices, and the supplementary motor cortex. Relatively early on in the hierarchy, there is a separation of the auditory system (Fig. 2b). The rest of the somatomotor system remains relatively cohesive. The cluster that spans several midline (medial prefrontal cortex and posterior cingulate cortex) and middle temporal structures, is known in the literature as the default mode, and is known to be involved in task-negative processes94,95. At the level of 6 communities (Fig. 1c), the default mode constitutes a single entity, but subsequently branches into a midline cluster (anterior medial prefrontal and posterior cingulate cortices) and a middle temporal cluster (Fig. 2a). This division is consistent both functionally and anatomically; evidence from task-imaging showing that the midline structures are functionally specialized for self-relevant decisions and the inference of other people’s mental states (i.e., theory of mind), whereas the temporal components are implicated in autobiographical memory94. At the level of 4 communities (Fig. 1a), there is a large “salience” community that is spread over several higher-order associative regions implicated in salience detection and response, which subsequently branches into specialized ventral (also known as the cingulo-opercular system) and dorsal salience systems (Fig. 1c), an organization that has been well described previously96. Interestingly, in Fig. 2c, there is branching of a community that is mostly expressed in the left hemisphere and includes Broca’s area and the superior temporal gyrus, and bears resemblance to areas of activation during language processing tasks45,60,97,98. This community, which has traditionally evaded most1,2 (but not all97,98) mesoscale investigations, may represent a language system97,98. It is notable that this putative language community is substantially more in agreement with the one described in Langs et al.98 than Spronk et al.97, despite the fact that the former used a markedly different analytic approach (voxel-based, clustering in embedding space, etc).

Another interesting observation is the absence of a parallel to the “limbic” network described by Yeo et al.1 which was localized to the ventral surface of the brain. A possible explanation is that the older data that was used in their study had marked signal dropout at the base of the brain, where this network was identified. The HCP data used in the current study had significant improvements in terms of a reduction in signal dropout from the ventral brain and other regions due to air sinuses; notably the orbitofrontal, inferior temporal, and lateral mid-temporal cortices99. In our analyses, parcels in this region were assigned to the default mode and the central executive communities. Another possible explanation for this discrepancy could be the varying definition of what a “cluster” means across studies. Here clusters refer to communities, whereas in Yeo et al.1, clusters were identified based on connectivity profiles. A recent study by Spronk et al.97, that used data from the HCP and the MMP cortical parcellation, identified an “oribito-affective” community that corresponds to posterior orbitofrontal parts of the limbic network described in Yeo et al.1, though the authors note that it had the lowest “confidence score” of community assignment among the identified networks. Much of the nodes in the described orbito-affective community (i.e., posterior orbitofrontal) are included in our central executive community (Fig. 1c).

Disagreements between our community node assignments and those in Spronk et al.97 could be due to several methodological differences, such as our use of a consensus procedure to avoid a network group average, a multiscale community detection framework, and a null model based on RMT for community detection that is more compatible with functional networks (compared to the more prevalent permutation null models). Because functional networks are estimated using correlation measures, they are subject to indirect effects that manifest themselves as artifactual connections. These indirect effects are due to second-, third- and higher-order interactions that are not present in the real (i.e., direct) network [for example, if nodes 1 and 2 and nodes 1 and 3 are strongly correlated, a correlation will also be observed between node 2 and 3 even where none exists (or, if a true relationship exists, here it would be overestimated)]17,18. This often leads to an inaccurate estimation of the network modular structure, if not accounted for by the null model40,41. The RMT-based community detection method that we employed mitigates these effects. In a recent study, a permutation null model failed to recover the ground truth community from neurophysiological recordings in the suprachiasmatic nucleus in mice, while an RMT-based null model was successful42. Another potential contributor to the discrepancy is the choice of the spatial scale of interest: while the current study characterized a wide range of spatial scales, and opted for a data-driven method to focus on those that are most expressed at the level of individuals in addition to functional homogeneity, Spronk et al.97 described the organization at the scale yielding 12 clusters, based on a priori “hard criteria” (e.g., a requirement for a separate auditory community), in addition to stability metrics. It is therefore conceivable that exploring different spatial resolutions in their data could show more consistent parallels.

Calculating the nodal consistency revealed that certain nodes were less consistent than others in terms of community assignments (Fig. 4). To ensure that this was not caused by signal dropout in these particular regions, we calculated the SNR from the time course. This analysis did not reveal an association between the nodal SNR and consistency scores. Another potential reason that we wanted to rule out was that low consistency nodes are those that are highly task-specific. To quantify that, we used the group-wise task fMRI activation map contrasts and calculated a measure of how much variability each node is exhibiting in terms of activation across different tasks. That also did not turn out to contribute to the variance of the nodal consistency scores. The other potential explanation was that these nodes play a specific—hub-like—topological role, integrating information from different modules, and because their connectivity profiles is not primarily limited to a single module, they are less consistently classified. While there are numerous metrics that index hubness90, we opted for the simplest definition—nodal strength. This hubness metric was found to partially account for the variability in consistency (contributing approximately 7.6% of the variance). Although it only explained a small part of the variance, it remains entirely possible that the nodal strength is insufficient to capture hubness. In fact, indexing hubness in correlation networks is a known challenge and traditional metrics do not rigorously identify important hubs100. In fact, one of the alternate definitions proposed in Power et al.100, which is based on finding nodes in which multiple modules are represented, is conceptually similar to our consistency metric.

Processing the data with and without MGTR resulted in broadly similar partitions (Fig. 5). We were able to say that the subject-derived consensus partitions resemble individual-level partitions more than the group-derived consensus because both were derived from the same data. However, the same framework could not be used to compare consensus partitions derived with and without MGTR as the comparator is not the same. Despite the fact that MGTR removes some relevant physiological neural data present in the global signal and may alter the properties of the networks especially the negative edges [which we had to remove (see SI)], it is very effective at removing motion-related artifact87,88. Evidence that supports “sacrificing” some neural data in favor of a less contaminated signal includes a recent observation that MGTR strengthens the associations between connectivity and behavioral measures101. Therefore, we release all atlases without taking a stance regarding which method is superior.

While we used a multimodal gradient-based parcellation because it is believed to be the most neuroanatomically accurate to date60, other atlases based on unimodal analyses of functional connectivity may be more functionally-coherent77,79,102. We repeated the main analysis using the Schaefer et al.77 atlas and recovered a broadly similar set of modules (see SI). However, a quantitative analysis103 was not within the scope of this article, but should be pursued in the future. In this study, we constrained the analysis to the cerebral cortex, though we recognize the importance of extending the mapping of these communities to the subcortex and cerebellum104,105. The multimodal brain atlas that we adopted does not include any subcortical or cerebellar nodes. Some groups have added community affiliations for the subcortex or cerebellum on a post hoc basis97,106,107. The challenge in our case is to keep the definition of a community consistent. This will likely require the treatment of the nodes of the whole brain at the same time. Another point that should be addressed in the future relates to the fact that we got a varying number of hierarchical levels across subjects (since we did not constrain the number of communities or hierarchical levels); this means that subjects did not all contribute an equal amount of information to the co-classification matrix. The decision to define nodes as brain parcels instead of a voxel-wise analysis was motivated by factors that include using neurobiologically-meaningful building blocks, mitigating MRI-related limitations, and computational intractability108. However, voxel-wise investigations (e.g., Barga et al.13, Gordon et al.12) are equally important in brain mapping endeavors as they naturally provide a finer level of granularity. In this study, we used the RMT null model41. Another null model that can be used with correlation networks without violating their properties is known as the uniform null model40,74. However, it requires the use of a tunable parameter, does not address the global mode (i.e., common trend), and can lead to a large number of singleton communities40,42,109. Finally, other mesoscale organizations should also be explored, such methods include stochastic block modeling110, overlapping communities111, and approaches that allow for varying spatial configuration of functional brain regions112.