ABSTRACT
Molecular epidemiology using genomic data can help identify relationships between malaria parasite population structure, malaria transmission intensity, and ultimately help generate actionable data to assess the effectiveness of malaria control strategies. Genomic data, coupled with geographic information systems data, can further identify clusters or hotspots of malaria transmission, parasite genetic and spatial connectivity, and parasite movement by human or mosquito mobility over time and space. In this study, we performed longitudinal genomic surveillance in a cohort of 70 participants over four years from different neighborhoods and households in Thiès, Senegal—a region of exceptionally low malaria transmission (entomological inoculation rate (EIR) less than 1). Genetic identity (identity by state) was established using a 24 single nucleotide polymorphism molecular barcode and a multivariable linear regression model was used to establish genetic and spatial relationships. Our results show clustering of genetically similar parasites within households and a decline in genetic similarity of parasites with increasing distance. One household showed extremely high diversity and warrants further investigation as to the source of these diverse genetic types. This study illustrates the utility of genomic data with traditional epidemiological approaches for surveillance and detection of trends and patterns in malaria transmission not only by neighborhood but also by household. This approach can be implemented regionally and countrywide to strengthen and support malaria control and elimination efforts.
Introduction
As malaria transmission declines, and as population level immunity wanes, spatiotemporal variation in malaria incidence is likely to become more pronounced1–4. This heterogeneity is evidenced by hotspots of malaria transmission which can sustain and amplify the malaria transmission chain5, 6. Targeting and eliminating malaria hotspots can play an important role in malaria elimination3, 6. Different studies using conventional malaria epidemiologic tools such as malaria incidence and prevalence rates, malaria morbidity and mortality rates, and entomological inoculation rates (EIR) have identified hotspots of malaria transmission; however, some of these metrics are better suited for more highly endemic regions7–9 and may become less useful as malaria transmission declines. As some regions strive for malaria elimination, along with the decline of the disease incidence some of these conventional malaria metrics may become less informative10.
Genomic data from sensitive molecular tools are capable of detecting low level parasitemia and of providing additional information on parasite genetic population structure to measure the dynamic changes in malaria transmission10, 11. Genomic epidemiology has been used detect associations between malaria parasite genetic diversity, dynamic changes in transmission intensity, and malaria programmatic impact12–15. We have previously validated a 24-SNP molecular barcode for monitoring changes in transmission intensity as well as for tracking specific parasite types in the population16. More specifically, genomic tools can reveal whether specific genotypes dominate hotspots from focal local transmission of individual strains, or whether the malaria transmission landscape is characterized by increased genetic diversity with significant potential for outcrossing resulting from sustained transmission or importation of multiple genotypes10, 17. Genomic data can be coupled with mapping data, such as GIS, for visualizing spatial epidemiology. These combined data types can be informative for evaluating control measures. For example, GIS and epidemiologic data were used for mapping clusters of malaria transmission in Gambia, Mali and Senegal and used to guide optimal malaria control intervention18. GIS information can also give important information at the household level for studies in micro-epidemiology and ecology19. This combined approach was used in Papua Indonesia by coupling genetic information of P. falciparum and P. vivax from microsatellite data and household level GIS information to study malaria micro-epidemiology20. In Senegal, few studies have used genomic data to understand Plasmodium parasite genetic diversity and spatiotemporal dynamics. Here, we seek to bridge this gap by using genomic epidemiology and ecology at the city, neighborhood, and household levels.
The aim of this present study was to apply the 24-SNP molecular barcode in a longitudinal cohort enrolled between 2014 and 2017 and followed for 2 years after enrollment to understand P. falciparum parasite population structure in Thiès, Senegal. We determined the spatio-temporal parasite haplotype distribution, and the association between physical distance and parasite genetic distance and the interconnectivity between parasites using multivariable linear regression models. These analyses helped generate hypotheses on possible reasons for transmission hotspots. The overall goal of this study is to help inform malaria control by integrating genomic data into decision making.
Results
Patient demographics and characteristics of the cohort
A total of 70 participants were enrolled following informed consent spanning 4 years (2014 = 2, 2015 = 32, 2016 = 23 and 2017 = 13) and followed for two years post-enrollment. Patients were recruited through passive case detection upon presenting at the Service de Lutte Anti Parasitaire (SLAP) clinic with malaria-like symptoms and testing positive by malaria rapid diagnostic test (Pfhrp2 antigen RDT) and microscopy for P. falciparum monogenomic infection. Patients were residents of 6 different neighborhoods in Thiès (Cité Senghor, Diakhao, Escale, Nguinth, Thialy and Takhikao) and 10 different houses (BD, BS, DL, DMS, GB, MS, OB, OD, SD and SN). The majority of the participants were from Diakhao 51/70 (72.8%), and the majority of shared household participants were in household MS; 37/70 (52.8%) (Table 1). While enrollment was open to all genders and ages, in this study all participants were male aged from 5 to 16 years. When evaluating the gender bias after enrollment, many were living in “daaras” – the equivalent of religious boarding schools. The mean age was 10.91 years, and 25% (18/70) of the participants were under 10 years. The mean parasitemias were 0.77%, with a minimum of 0.03% and a maximum of 4.89% (Table 1).
Limited genetic diversity and high frequency of monogenomic infection
Among the 70 participants, there was a total of 74 distinct infections. All participants were parasitemic at day 0, and 4 participants were infected with a subsequent re-infection over the course of 2 years of follow-up (8 visits, 4 each year). Overall, monogenetic infection was predominant 90.5% (67/74) and polygenomic infections were rare, 9.4% (7/74). This finding is predicted for the region14. The majority of polygenomic infections were observed in 2015 (5/7). More than 41.8% (31/74) of parasite genetic types were shared within participants overall (2014 = 0/2, 2015 = 11/32, 2016 = 12/27 and 2017 = 8/13). Twenty-seven haplotypes previously described in Thiès were detected in 56 isolates (56/74); nine unique genotypes were observed in this cohort and detected in eleven participants (11/70). Among the 27 haplotypes we found 8 clusters (shared parasite haplotypes between at least two participants); these were haplotypes 25, 83, 759, 796, 804, 846, 873, and 1001. Among the 19 unique parasites, 2 clusters (U2 and U4) in a total of 4 infections were found (Figure 1). A heatmap reveals parasite clustering by barcode similarity (Figure 2).
Genomic haplotypes are shared within years and persist across malaria transmission seasons
Genetic types were shared within a transmission season but also across transmission seasons (lineage persistence). The dominant haplotypes that persisted for multiple seasons were haplotypes 759 and 796. Haplotype 759 (n = 10) was detected in 2015 in Diakhao in the MS household and in 2017 in Nguinth in the GB household. Haplotype 796 (n = 11) was observed in 2014 (n = 1), 2015 (both in MS) and 2016 (n = 8) in the neighborhood of Diakhao in three different households (MS, OB and OD). Haplotype 83 was found in 2016 in two different neighborhoods (Diakhao and Takhikao) in three different households (MS, OB and BD) (Figure 3).
Genomic signatures among initial and re-infection
During the patient follow-up, 4 participants recruited in 2015 (Th078.15, Th120.15, Th130.15 and Th151.15) were re-infected in the following year during the peak of the malaria transmission period. Unscheduled visits in which study subjects had symptomatic infections are designated as “re-infections” or R1 (Th078.15.R1, Th120.15.R1, Th130.15.R1 and Th151.15.R1). Re-infected participants were all from the same household (MS). Parasite genotypes from the first infection were genetically distinct from the re-infected parasites (Figure 4), and three of them (U5, U6 and U7) represented genotypes which had not been previously described, either in Thiès or in multiple regions of Senegal from 2006-present14, 16, 21.
Effect of spatial distance on parasite genomic similarity
We calculated the genetic similarity between each unique pair of patients as well as the geographic distance between their respective households to determine whether increasing physical distance between households is associated with greater genetic difference. When analyzed by year (to normalize for the circulating genetic variants present within a season), we observed a significant positive association between physical and genetic distance in 2015 and 2016 (Table 2); as physical distance increased between two households, so did the genetic distance of the patients in those households. When data were analyzed within a year (to account for differences in the genetic types most prevalent between years), we observed the same trend. Further, we observed that for all households except one (MS), living in the same household was more likely to result in participants being infected with similar genomes, although these findings were not statistically significant for all households (Table 2). In all years, there were signs of housing cluster effects, with genetic distance decreasing significantly for participants located in certain houses together.
Discussion
As malaria control progresses towards elimination, genomic data has proven to be essential in assessing control and surveillance22. In this study we combine parasite genetic diversity indices, individual GPS information at the household and neighborhood level, and multivariable linear regression models to understand parasite diversity and connectivity over time and space in Thiès, Senegal. The main findings of this study were household clustering of genetic types, association with genetic distance and physical distance, as well as parasite sharing between participants from either the same household or different households which were geographically proximal. Interestingly, in past analyses from cross-sectional studies in Thiès, this spatial clustering of identical genotypes was not observed23.
Overall, the low level of Plasmodium parasite genetic diversity and the high frequency of monogenomic infection observed over years are generalizable and consistent with previous observations in the longitudinal cross-sectional sampling over time in Thiès14, 23 and Dielmo and Ndiop in Senegal16. In these localities 24-SNP molecular barcoding revels a predominance of monogenic infection and a significant percentage of shared genomic haplotypes in the population. These observations have been hypothesized to be the result of a significant reduction in malaria transmission due to the efficiency of malaria interventions post-2008. Malaria parasites are known to evolve quickly, in part due to sexual recombination in mosquitos, which plays an important role in parasite population structure and complexity of infection17. As expected, in this low transmission setting, we see decreased evidence of outcrossing and increased selfing of identical parasites relative to populations with higher transmission intensity and a higher complexity of infection14, 24.
Consistent with previous studies in Thiès14, 23, we also observed the existence of haplotypes persisting over several years. Our study adds the additional GIS data which permitted us to monitor the genotype frequencies in different households and neighborhoods within the same year and across seasons. We also have the example of haplotype 796 observed in the same neighborhood (Diakhao) in 3 successive years (2014 - 2016) in three different households (OB, OD, and MS). When observing identical genotypes in households, there are two possibilities: 1) continued local transmission of a single parasite genotype, or 2) a single infected mosquito biting multiple infected individuals within the household. The temporal nature of the infections can help distinguish which hypothesis is more likely. This study has also demonstrated that simultaneous transmission of the same genotype is frequent in Thiès25.
We observed household clustering and genetic differences between parasites to increase with distance between individuals. During this study, a particular household (MS) served as an example of a malaria hotspot of transmission at the household level, both in number of cases as well as genetic diversity of the parasites. Having such different parasites in the same household could be the result of a hotspot of local intense transmission17, coupled with genetic recombination (outcrossing) within the Anopheles mosquito26, and the subsequent transmission of new genetic combinations17. An alternative could be the importation of diverse genotypes due to human or mosquito mobility21, 27–30. A similar study of malaria incidence and prevalence has demonstrated the existence of malaria transmission hotspots at the village level in Senegal7, 8. In such villages, human density, human behavior, lax malaria bed-net use, substandard housing construction, and a favorable ecological environment for mosquito proliferation (presence of mosquito breeding sites) have all been identified as risk factors for a household to be in a hotspot31. The added value of our approach is being able to identify hotspots of transmission in terms of cases, but also to determine the genotypic nature of these hotspots – adding further to implications for control measures. If hotspots are populated by similar genotypes, it is more likely that local transmission is occurring. If multiple diverse genotypes are present, the hotspot could serve as a hub of human or mosquito imported infections. Identification of the transmission clusters at the household level will play an important role for interrupting malaria transmission chains5, 32. In our study, the prevention and control interventions needed to combat malaria in a household with a homogenous genetic haplotype may be very different from a household (such as MS) where many genetically diverse parasites are prevalent. This study provides new support of how molecular tools can help identify malaria hotspots.
Because P. falciparum is a sexually recombining organism, precise mapping of phylogeny and transmission chains is not possible; however, the 24-SNP barcode has been shown to be a proxy for whole-genome that allows resolution especially of highly similar parasite types14. While the 24-SNP barcode does not provide as complete information about genetic relatedness (identity by descent) as whole-genome sequencing or large SNP arrays33, it has been estimated that the 24-SNP barcode can confidently detect parasites that share greater than 70% genome similarity (identity by state)14.
While the pairwise genetic distance in the 24-SNP barcode is not linearly associated with whole-genome genetic distance, our finding of significant associations with physical distance is even more noteworthy. Our statistical model demonstrated that genetic variation between parasite pairs increases with physical distance. Here we used the number of SNP differences between paired individuals as genetic distance, or identity by state. Studies in The Gambia and Kenya have demonstrated that variation between parasite genotypes increases with geographical distance34, 35. Such findings will help in understanding how the parasite population is structured in Thiès and the connectivity between parasites, despite some studies in Thiès having suggested a mixed parasite population with no hidden population structure25. In this study, sampling biases (number of limited samples) may not reflect the overall parasite population that is captured by passive case detection, and notably, we found no asymptomatic infections in any of the follow-up time points in the cohort.
Many of the enrolled participants in our study live in “daara”, religious boarding schools where “talibe” (student followers) live together in large numbers. Our cohort was completely male, although enrollment was open and encouraged for both male and females. These limitations may affect the generalizability of the study beyond these populations. Nonetheless, our methods provide important information in the micro-epidemiology of parasite population structure in space and time in Thiès. The study also provides evidence of the feasibility and power of including genomic analyses in making public health decisions.
In this study malaria spatial-temporal clustering at the household and neighborhood level were observed along with increasing genetic distance between parasites as a function of physical distance. The longitudinal study shows the importance of applying molecular surveillance along with spatial and temporal modeling to detect hotspots of malaria transmission. This approach can be applied at a small scale as well as at a regional scale and countrywide for detecting and targeting malaria foci to further assess initiatives of malaria control and intervention.
Methods
Ethics Statement
Ethical approval for this study was granted by the National Ethics Committee of the Ministry of Health in Senegal (Protocol SEN 14/49), the Institutional Review Board of the Harvard T.H. Chan School of Public Health (IRB 14-2830), and the Human Investigation Committee of Yale University (Protocol 2000023287). All samples were collected with informed consent and in accordance with all ethical requirements of the National Ethics Committee of Senegal, Institutional Review Board of the Harvard T.H. Chan School of Public Health, and the Harvard, and the Human Investigation Committee of Yale University.
Inclusion and Exclusion Criteria
Samples were collected from patients greater than 6 months of age with uncomplicated P. falciparum malaria confirmed by a peripheral blood smear and rapid diagnostic test (RDT) by the local health officer of the health clinic of Thiès (SLAP). Patients were asked to return to the health post, regardless of health status, at days 1, 2, 3 and after 2 weeks, 4 weeks, 3 months, 6 months, 12 months, 18 months, and 24 months. Patients were also asked to return for an unscheduled visit if they experienced malaria-like symptoms. At the visit on days 1-3, the patient was monitored for the clearance of parasitemia by finger prick and a microscopy slide and an RDT was evaluated. At scheduled follow-up visits at 2 weeks, 4 weeks (1 month), 3 months, 6 months, 12 months, 18 months, and 24 months, 5 mLs of blood was drawn for plasma and PBMCs. On Day 0 and at unscheduled visits where a patient was confirmed to be positive with P. falciparum, blood was also cryopreserved, and parasite DNA was extracted from whole blood with the QIAamp DNA blood mini kit (Qiagen Inc., Valencia, CA, USA).
Molecular Barcoding genotyping
24 SNP molecular barcodes were identified using a previously described assay36. Barcode assays were run on the LightCycler 96 Roche system. SNPs were amplified as follows; 2.0 µL of Lightscanner Master Mix (BioFire Defense), 2.5 µL of a 1:100 dilution DNA template, and 0.5 µL of primers and probes. Genomic DNA from cultured P. falciparum strains (3D7, Dd2, 7G8, Tm90) was used for assay validation and as genotyping controls for all reaction plates. Molecular barcode assays 10, 11, 13, 21, and 24 were performed optimally under asymmetric forward to reverse primer ratios of 5:1; all other assays required a 1:5 primer asymmetry. Amplification conditions were 95°C denaturation for 2 minutes, 50 cycles of 94°C for 5 seconds and 66°C for 30 seconds, plus a pre-melt cycle of 5 seconds each at 95°C and 37°C. Two or more N’s among the 24 SNPs assayed was taken to indicate that more than one P. falciparum genomes was present (a polygenomic infection). Ambiguous calls and calls with “X” were repeated 3 times in independent experiments before validation36.
GIS analysis and statistical modeling
GPS coordinates of participants’ households (while not revealing individual participant addresses or identifiable locations) and neighborhoods were used to make different maps with QGIS 3 (http://www.qgis.osgeo.org). We used multivariable linear regression models to determine if the genetic similarity between pairs of participants is related to the geographic distance that separates them. The number of 24-SNP barcode differences between each unique pair of participants was used to describe their genetic similarity. We used two metrics to describe spatial proximity in the analysis. First, for each unique pair of participants we determined whether the individuals were located in the same house and if so, noted which house it was. Next, we calculated the geographic distance between the house centroids for each pair. In this way, we explore the impact of geographic distance on genetic similarity in two ways; are people clustered in the same house more genetically similar and are individuals in houses that are closer together geographically more genetically similar.
We then model genetic distance between each pair of participants as a function of the spatial distance between their houses and a clustering indicator for the specific house, where each house has its own specific regression parameter. The model is given as: where Yij is the genetic distance between participants i and j, dij is the geographic distance between the house centroids of participants i and j, m is the total number of unique houses in the analyzed dataset, I(.) is an indicator function that is equal to one if the input statement is true and is equal to zero otherwise, and εij ∼ N (0, σ2). This model relaxes the assumption that clustering in any house has the same impact on genetic similarity and allows for the possibility that this effect changes across the different houses (i.e., αk). The parameter β1 describes the association between genetic and geographic distance between houses. We fit this model to each individual year of data separately and present inference for the model parameters37, 38.
Data Availability
Data associated with this manuscript can be found at: https://doi.org/10.5061/dryad.wh70rxwmk
Data Accessibility
Data associated with this manuscript can be found at: https://doi.org/10.5061/dryad.wh70rxwmk.
Author contributions statement
A.K.B, D.N. and D.F.W conceived and designed the study. B.D, P.I.N, Y.D. and A.M.M. collected and processed samples and managed patient databases. M.S., A.B.D, and A.K.B performed molecular barcoding and genomic analysis and interpretation. M.S and A.K.B performed mapping and data visualization. J.W. performed data modeling and analysis. M.S., J.W., S.K.V, D.L.H, A.K.B. revised this manuscript. All authors read and approved the final manuscript.
Additional information
Competing interests
The authors declare that they have no competing interests.
Acknowledgements
We would like to express great gratitude to the population of Thiès, Senegal and to the heath workers of the SLAP clinic for their invaluable collaboration and contribution to this work. We sincerely appreciate the helpful discussions and useful inputs made by Daniel Neafsey to this study. We would like to thank Sidiya Mbodj and Fatoumata Dabo for assistance with geolocalization. We sincerely thank all colleagues who contributed to this work. Funding for the study was provided by the NIH (K01 TW010496) to Amy K. Bei, and by the Bill and Melinda Gates Foundation (OPP1053604) to Dyann F. Wirth.