Abstract
Transmission bottlenecks introduce selection pressures on HIV-1 that vary substantially with the mode of transmission. Recent studies on small cohorts have suggested that stronger selection pressures lead to fitter transmitted/founder (T/F) strains. Manifestations of this selection bias at the population level have remained elusive. Here, we analysed early CD4 cell count measurements reported from ∼340,000 infected heterosexual individuals (HSX) and men-who-have-sex-with-men (MSM), across geographies, ethnicities and calendar years and found them to be consistently lower in HSX than MSM (P<0.05). The corresponding average reduction in CD4 counts relative to healthy adults was 86.5% in HSX and 67.8% in MSM (P<10−4). This difference could not be attributed to differences in age, HIV-1 subtype, viral load, gender, ethnicity, time of transmission, or diagnosis delay across the groups. We concluded that the different selection pressures arising from the different predominant transmission modes have resulted in more pathogenic T/F strains in HSX than MSM.
Introduction
The bottlenecks in HIV-1 transmission result in a ‘selection bias’ favoring fitter transmitted/founder (T/F) viruses over less fit ones1,2. Several recent studies have presented evidence of genetic, phenotypic, and clinical manifestations of the selection bias in small cohorts1,3–6. From 137 heterosexual (HSX) donor-recipient pairs, T/F viruses were found to carry higher than average frequencies of amino acids associated with high in vivo fitness1. Similarly, from 127 discordant couples, lower viral replication capacity (vRC), indicative of lower viral fitness, early in infection was associated with slower decline of CD4 T cell counts4,6. The selection bias varies with the mode of transmission3. The stronger the bottlenecks, the fitter the corresponding T/F viruses are likely to be1,2. Anal intercourse is over 10-fold more permissive on average than penile-vaginal intercourse7. Analysis of T/F genomes from 131 subjects revealed that the T/F genomes were under greater positive selection in heterosexual individuals (HSX), in whom the penile-vaginal mode predominates8, than homosexual men, or men-who-have-sex-with-men (MSM), who transmit predominantly through anal intercourse3. Among HSX, men had T/F viruses with higher predicted fitness in vivo than women1, consistent with the asymmetry of the bottlenecks between insertive and receptive penile-vaginal intercourse7.
An important question that follows is whether the differential selection bias across modes of transmission is manifested at the wider population level. Such differential bias could contribute to variations in disease progression and treatment outcomes and underlie the diverse trajectories of the HIV-1 pandemic across infected groups in which different modes of transmission predominate.
Results and Discussion
To answer this question, we decided to compare early CD4 T cell count measurements between HSX and MSM. Immediately following infection, CD4 T cell counts fall steeply, recover partially, and then settle within a few weeks/months to a value smaller than in the pre-infection state9(Fig. 1(a)). Subsequent changes in the CD4 counts occur slowly, over many months to years. Thus, CD4 count measurements made early in infection tend to be close to the value to which the counts settle after the initial dynamics. These early CD4 counts are expected to be minimally affected by host-specific adaptive mutations1 and, therefore, representative of the fitness of the T/F strain in the recipient. The fitter the strain, the lower would be the CD4 count. The CD4 count is also a more robust marker of disease state than other commonly used markers such as set-point viral load (SPVL). High vRC of the T/F viruses was associated with low CD4 counts at 3 months post-infection (which roughly coincides with seroconversion) and rapid CD4 count decline for ∼5 years, independently of SPVL4,6.
HSX and MSM are the two major groups driving the global HIV-1 epidemic9. They use predominant modes of transmission with a substantial difference in the selection bias7. Importantly, they display little inter-mixing in most geographical regions. We inferred the latter from the distinct prevalence of HIV-1 subtypes in the two groups, which we found across geographical regions and calendar years (Fig. 2; Text S1; Tables S1 and S2). Together, these characteristics allow for the difference in the selection bias to be sustained long-term, potentially amplified, and manifested in sample sizes large enough for detection with statistical significance. We thus hypothesized that the stronger selection bias associated with penile-vaginal transmission than anal transmission would result in lower early CD4 counts in HSX than in MSM.
To test this hypothesis, we collated available data of CD4 count measurements either at seroconversion or at diagnosis from all large studies, which amounted to a total of ∼340,000 patients across four geographical regions followed over a total period of nearly four decades, and examined the differences between HSX and MSM (Methods; Table 1). We found that HSX consistently had lower CD4 counts than MSM (Fig. 1(b); Tables 1 and Tables S3-S5). For instance, measurements from ∼120,000 patients across 21 countries in the European Union and European Economic Area (EU/EEA) indicated, following population-weighted averaging of yearly data during 2010–2018, that the mean CD4 count in MSM at diagnosis was ∼440 cells/μL, whereas it was substantially lower, ∼300 cells/μL, in HSX (P<10−4)10. The numbers were similar in the preceding 5 year period (2002–2007) reported by a smaller study involving a few thousand patients11. In the UK, measurements from close to 9000 patients during 1990–1998 showed that the counts at diagnosis were ∼330 cells/μL in MSM and ∼230 cells/μL in HSX (P<10−3)12. In China, during 2006–2012, the mean CD4 counts at diagnosis from ∼180,000 patients were ∼370 cells/μL in MSM and ∼270 cells/μL in HSX (P<10−4)13. Similarly, in the US, from over 25,000 patients during 2006–2015, the counts at diagnosis were ∼400 cells/μL in MSM and ∼300 cells/μL in HSX (P<10−4)14. We also examined/estimated the counts at seroconversion where available. In the CASCADE study, involving ∼4000 patients during 1979–2000 in Europe and Australia, the mean cell counts at seroconversion were ∼620 cells/μL in MSM and ∼590 cells/μL in HSX (P = 0.027)15. Further, using the reported diagnosis delays and the slopes of CD4 count decline in the US population above14, we estimated that the cell counts at seroconversion, for the age group 13–29 years, were ∼550 cells/μL in MSM and ∼480 cells/μL in HSX (P<10−4) (Methods). Remarkably, we did not find any large study (sample size & 1000) that reported higher early CD4 cell counts in HSX than MSM.
While the evidence from absolute CD4 count comparisons was thus overwhelming, differences in CD4 counts in healthy (uninfected) individuals across gender, ethnicity and geographical regions could render absolute CD4 counts only an approximate measure of the fitness of the T/F strains. Two individuals may have similar early CD4 counts but may still have been infected by T/F strains of different fitness if their pre-infection CD4 counts were different, with the individual with the higher pre-infection count infected by the fitter T/F strain. To overcome this limitation, we constructed a metric to quantify the relative reduction in the CD4 cell count, R, corresponding to the absolute CD4 count T as , where Thealthy was the count pre-infection, and TAIDS = 200 cells/μL the count defining AIDS. Thus, R was 0% when T = Thealthy and 100% when T = TAIDS and decreased linearly with T between these extremes. Choosing Thealthy specific to the respective geographies, ethnicities, and genders (Table S6), we estimated R corresponding to the early cell count measurements above, which we denoted as RT/F, indicative of the relative reduction in CD4 count due to the T/F virus (Fig. 1(c)). The higher the RT/F, the fitter would be the T/F strain, regardless of the pre-infection CD4 count, rendering RT/F a more robust marker of T/F viral fitness than the associated early absolute CD4 counts. (Note that RT/F is a static measure and is not indicative of the ‘speed’ of disease progression; cell count decline can be faster despite higher early CD4 counts in MSM than HSX15,16.)
We found that in EU/EAA, during 2010–18, RT/F was 86.2% in HSX and 66.2% in MSM (P<10−4). During 2002–07, these numbers were 88.8% and 67.8% (P<10−4), respectively. The corresponding numbers were 96.0% and 78.2% in the UK (P<10−4), and 86.7% and 68.0% in China (P<10−4).
In the US, the difference was smaller but still substantial, with RT/F of 85.8% in HSX and 73.7% in MSM (P<10−4). At seroconversion, these numbers were 64.7% and 51.0%, respectively (P<10−4). For the seroconverters from the CASCADE study, the trend was consistent, with RT/F of 47.1% in HSX and 40.7% in MSM (P<10−3). Overall, thus, RT/F comparisons showed more significant differences between MSM and HSX than absolute CD4 count comparisons (Fig. 1(b) and (c)). Further, RT/F allowed comparison across the different datasets. Thus, while the HSX all had RT/F >85% at diagnosis, the MSM displayed a range from ∼65% to a little under 80%. We could also combine the datasets, including those at diagnosis and seroconversion, and estimate an overall RT/F. Using a population-weighted average across the datasets, we estimated the overall RT/F to be 86.5% in HSX and 67.8% in MSM (P<10−4) (Fig. 1(d)). This overall comparison provides strong evidence of greater cell count reduction due to, and hence greater pathogenicity of, the T/F viruses in HSX than in MSM.
To attribute the differences in RT/F between HSX and MSM to the differential selection bias at transmission in the two groups, we considered and ruled out all the major potential confounding factors. First, MSM are typically diagnosed at a younger age than HSX. In the two European studies, MSM were 5 (Table S5)10 and 1.6 years11 younger on average than HSX at diagnosis. Given the cell count decrease of ∼7 cells/μL per year of age at diagnosis12, the CD4 counts should have been higher in MSM by only ∼35 and ∼11 cells/μL, whereas they were higher by 135 and 143 cells/μL (Fig. 1(b)), respectively, a difference that could not be explained by the age at diagnosis. Second, MSM are often predominantly infected by subtype B17, whereas HSX are by subtypes B and C (Fig. 2; Text S1). This subtype difference should have resulted in lower CD4 counts in MSM than HSX because of the higher virulence of subtype B18,19, a trend opposite of what is observed. Moreover, in the US where subtype B dominates both HSX and MSM (Text S1), RT/F was lower among MSM (Fig. 1(c)). In agreement, an independent study found that subtype B T/F viruses had higher fitness among HSX than MSM3.
Third, the CD4 counts could not be explained as an indirect manifestation of variations in SPVL; in the European study, CD4 counts were higher in MSM despite higher SPVL in MSM than HSX (Table S3). Fourth, healthy men had lower CD4 counts than HSX and healthy women everywhere except China (Table S6), and infected HSX men displayed higher RT/F than MSM (Table 1 and Fig. S1), two reasons to rule out gender as the cause of lower RT/F in MSM. Fifth, in Europe (EU/EEA), while MSM are predominantly Caucasian, 30–35% of infected HSX are of sub-Saharan African origin10,11.
In China, however, where no differences in ethnicity exist between MSM and HSX, a substantial difference in RT/F is seen between them (Fig. 1(c)), ruling out ethnicity as a confounding factor.
Further, accounting for baseline CD4 count differences across ethnicities in EU/EEA did not alter our findings (Table 1). Sixth, early onward transmission may limit donor-specific adaptations in the T/F strain and allow it to cause more severe cell count reduction in the recipient. Early transmissions, however, are more common to MSM than HSX18,20, in keeping with the greater association of MSM with transmission clusters17(Fig. 3; Table S7), and should have led to higher RT/F in MSM than HSX, in contrast to our findings. Seventh, although MSM tend to be diagnosed earlier than HSX14 and may thus suffer a lower loss of CD4 counts at diagnosis, the differences are seen also in CD4 counts at seroconversion14,15, which would occur at similar times post infection in the two groups. Besides, MSM had lower cell counts in China too, where, owing to social stigma, MSM may not get diagnosed earlier than HSX13. The difference in RT/F between MSM and HSX was thus not attributable to any of the above factors. We concluded therefore that the difference originated from the variations in the fitness of the T/F strains in the two groups arising from the different selection biases at transmission.
Our findings establish the selection bias at transmission as an important underlying factor shaping HIV-1 adaptation at the population level. The differential adaptation of HIV-1 to MSM and HSX, which in most geographical regions show little inter-mixing, may have led over the years to the selection and, possibly, fixation of different adaptive mutations in the T/F viruses in the two groups. Genetic differences have been observed between T/F strains in MSM and HSX in small cohorts3.
Future studies may establish them at the population level, as sequencing technologies that allow facile identification of T/F viruses emerge. The technologies may also serve to elucidate such differences between other infected groups, which are likely to be present to lower degrees than between MSM and HSX, depending on the differences in the selection bias between the groups, the exclusivity of the associated modes of transmission, and the extent of mixing between the groups. Our findings also suggest that heritable viral traits such as SPVL21 may have evolved differently in MSM and HSX, potentially driving differential spread of the HIV-1 epidemic in the two groups. The extent of these differences may determine whether intervention strategies, including the development and use of preventive vaccines, may have to be tailored to individual infected groups.
Methods
Data of CD4 counts
To test our hypothesis that early CD4 counts in HSX would be higher than in MSM at the population level, we collated data from all large studies (n ≳ 1,000) that reported CD4 counts either at diagnosis or seroconversion in both these groups. The data are summarized along with our analysis in Table 1 and details are in Tables S3-S5. From reports on countries in the EU/EEA and China10,13, we digitized the median CD4 counts using WebPlotDigitizer (https://automeris.io/WebPlotDigitizer). For our analysis, we averaged the data over the study duration. To obtain sample sizes, we multiplied the diagnosed cases with the reported fraction of diagnoses contributing to the annual CD4 counts in the entire EU/EEA (Table S5). The fraction was assumed to be the same across the risk groups and the set of 21 countries studied. We also assumed the proportions of men and women in HSX to remain the same during 2010-18. In the CASCADE study15, which segregated data into age groups, we averaged over age groups. To obtain the population-weighted average CD4 counts, we assumed that the proportions of the populations in the different transmission categories were the same across age groups and that the fractions of men and women remained conserved (except in MSM and hemophiliacs) (Table S3). To calculate RT/F, we also collated data of CD4 counts from healthy, uninfected adults in the USA, UK, Italy (which was used for the three studies involving European populations), Tanzania, and China, which are listed in Table S6. For RT/F calculations pertaining to the UK, CD4 counts from healthy MSM and HSX were available, which we used. We found the counts in MSM comparable to those from healthy HSX men. As a result, for other populations, we used the cell counts for healthy HSX men where counts from healthy MSM were unavailable.
Estimation of mean CD4 counts and their standard deviations
When the median, m, and interquartile range (IQR), (ql, qu), of CD4 counts were available, we estimated the corresponding mean, μ, and standard deviation (SD), σ, using and , following the widely used method22 applicable to large sample sizes, as considered here. When 95% confidence intervals (CIs), (cl, cu), were available instead of IQR, we evaluated SD using another method23 which yielded when the sample size n ≳ 100. When IQR was unavailable, we approximated the medians as the means, assuming the distributions to be normal. For data from China and EU/EEA, where σ was available for the total population, consisting of all the transmission categories, we estimated σ for MSM and HSX using the ratios between σ corresponding to MSM or HSX and the total population reported from other studies (see footnote in Table S4). Similarly, for obtaining σ for HSX men and women, we employed the corresponding ratios of maximum σ from the CASCADE study. When information necessary to estimate σ was unavailable, we used the highest σ available from the most relatable dataset, as with the UK and the CASCADE study. To estimate the SD of RT/F, we employed the error propagation equation24 and derived , where μ,σ are given in Tables 1 and S6. For σ(RT/F) of all the data combined, we chose σ from EU/EEA, involving data from 21 countries.
Estimation of CD4 counts at seroconversion
In the US study14, a model of CD4 count decline following seroconversion has been proposed, which allowed us to estimate CD4 counts at seroconversion from measurements at diagnosis. According to the model, the CD4 count T in an untreated individual at time t from seroconversion follows , where a0 and b1 are constants and e1t is an error term. At seroconversion, the CD4 count, T0, was obtained by setting t = 0, so that . Assuming that e1t = e10, it followed that . The values of b1 for different age groups and transmission categories were available25. Also, the median delays (and IQR) in diagnosis following seroconversion, td, have been estimated14, using which we calculated the corresponding mean and SD. For MSM and HSX, we took the mid-value of the means of td in 2006 and 2015 and chose the largest SD, and obtained td = 4.05 ± 6.67 and td = 5.40 ± 9.04 years, respectively, for the duration 2006–15. If Td is the CD4 count at diagnosis, then . We applied the analysis to data from the most populated age group (13–29 years) and used the mid-value, 21 years, for which b1 was −0.93, −0.77, and −0.80 year−1for MSM, HSX men, and HSX women, respectively. Furthermore, we assumed the fractions of females to be the same among all non-MSM groups, in order to obtain the population sizes of MSM and HSX in this age group (footnote in Table S3). Correspondingly, we obtained b1 = −0.79 year−1for HSX. To obtain uncertainties in the estimates of T0, we repeated the above analysis with Td and td set at values ±σ away from their respective means, but ensuring that their lowerbounds ≥ 0 and omitting terms that are second order in σ. Half the difference between the resulting maximum and minimum values of T0 yielded the σ corresponding to seroconversion.
Statistical analysis
To examine whether the mean CD4 counts (or mean RT/F) were significantly higher (or lower) in MSM than HSX, we employed the one-tailed t-test with unequal variance with the test statistic and degrees of freedom , where nHSX and nMSM were the two sample sizes, respectively26. The tests were performed using the R package27, which yielded corresponding P values.
Data of HIV-1 subtype prevalence
To assess the extent of mixing between MSM and HSX, we collated data of the prevalence of HIV-1 subtypes in the two groups across relevant geographical regions and calendar years. The data are summarized in Fig. 2 and Tables S1-S2 and discussed in Text S1.
Data of association with transmission clusters
Finally, we considered the extent of association of MSM and HSX with transmission clusters as an indicator of the time of onward transmission post-infection. The corresponding data we collated along with data of the compositions of the largest transmission clusters in different settings are in Fig. 3 and Table S7 and are discussed in Text S2.
Data Availability
All the data used in the study has been previously published. The sources are indicated in the manuscript. No new data is generated as part of this study.
SUPPLEMENTARY INFORMATION
Text S1: Distinct subtype prevalences indicate minimal mixing between MSM and HSX
In many geographical locations, mixing between MSM and HSX appears minimal. This is evident from the different prevalences of HIV-1 subtypes in the two groups. MSM in western nations are dominated by HIV-1 subtype B, whereas HSX comprise a mixture of subtypes1, with subtypes B and C being the predominant ones2. For instance, in the United Kingdom, from 2002-2010, MSM had nearly 90% subtype B infections, whereas HSX had a little over 10% subtype B. Mixing between the two groups would have led to a more similar distribution of subtypes in the two. The two groups thus appear to have sustained their respective infections over the years in near complete isolation. The difference in subtype prevalences holds also in Canada, Spain, France, and other nations (Fig. 2(a); Table S1). In China, the dominant subtype is CRF01-AE, which is present in MSM with a frequency of >50% but in HSX at <40% (Fig. 2(b); Table S2)3, perhaps indicative of more mixing than in Europe. In Korea, the extent of mixing could not be assessed using subtypes because over 80% of all infections were subtype B4. In USA, though subtype B dominates both MSM and HSX5,6, mixing between the groups has been argued not to be common7. In the Nordic states, some mixing between MSM and HSX is evident8. Overall, little mixing between MSM and HSX is evident in most geographical settings, suggesting that the different selection biases between the groups may have been sustained over the course of the epidemic.
Text S2: Clustering and transmission patterns
MSM are known to engage in different sexual contact patterns compared to HSX. They tend to have more partners than HSX9,10. They are also far more likely to belong to transmission clusters compared to HSX1. A transmission cluster comprises individuals carrying viral genomes that cluster together in a phylogenetic tree11, suggesting that the viral sequences isolated from the individuals are closely related. In Japan and China, an infected MSM had a nearly 40% chance of being part of a cluster, whereas an infected HSX had <10% chance12. In France, the corresponding numbers were ∼35% and ∼4%, respectively13. This trend was true for all the countries with data available except the Netherlands (Fig. 3(a); Table S7). MSM also formed larger clusters than HSX. The largest clusters reported in Belgium and Spain comprised nearly 100 individuals each, with the Belgian cluster containing ∼70 MSM and the Spanish cluster exclusively MSM (Fig. 3(b); Table S7)14,15. Together, these data suggest greater similarity in the viral strains in MSM than HSX. One way in which this greater similarity could arise is by onward transmission occurring sooner after infection in MSM than HSX, allowing lesser individual host-specific adaptation before transmission.
Supplementary Figures
Supplementary Tables
Acknowledgments
We thank Pranesh Padmanabhan, Rajat Desikan, and Pradeep Nagaraja for comments. This work was supported by the DBT/Wellcome Trust India Alliance Senior Fellowship IA/S/14/1/501307 (NMD).