Abstract
In HIV prevention trials, precise identification of infection time is critical to quantify drug efficacy but difficult to estimate as trials may have relatively sparse visit schedules. The last negative visit does not guarantee a boundary on infection time because viral nucleic acid is not present in the blood during early infection. Here, we developed a framework that combines stochastic and deterministic within-host mathematical modeling of viral dynamics accounting for the early unobservable viral load phase until it reaches a high chronic set point. The infection time estimation is based on a population non-linear mixed effects (pNLME) framework that includes the with-in host modeling. We applied this framework to viral load data from the RV217 trial and found a parsimonious model capable of recapitulating the viral loads. When adding the stochastic and deterministic portion of the best model, the estimated infection time for the RV217 data had an average of 2 weeks between infecting exposure and first positive. We assessed the sensitivity of the infection time estimation by conducting in silico studies with varying viral load sampling schemes before and after infection. pNLME accurately estimates infection times for a daily sampling scheme and is fairly robust to sparser schemes. For a monthly sampling scheme before and after first positive bias increases to -7 days. For pragmatic trial design, we found sampling weekly before and monthly after first positive allows accurate pNLME estimation. Our estimates can be used in parallel with other approaches that rely on viral sequencing, and because the model is mechanistic, it is primed for future application to infection timing for specific interventions.
Introduction
A key challenge for HIV prevention trials is to identify the timing of the exposure that ultimately led to breakthrough infection. Estimation of infection time subsequently allows inference of the concentration of the protective agent at exposure, which is critical to understanding why HIV acquisition was not prevented. Early infection is difficult to study in practice; even if prospective sampling were available, HIV RNA is not detectable in blood during early HIV infection and participants cannot necessarily point to potential recent exposure events with accuracy. Therefore, to estimate time of infection, a model or inference technique is required.
Estimation techniques have been described previously. Several use viral sequence data and evolutionary models to trace time back to the founder sequence1-3. Others use viral load data prior to viral peak and retrace using log-linear regression (average or maximum upslope)3. Others apply diagnostic ‘window times’ that leverage Fiebig staging4 and prior knowledge of eclipse phase duration5,6, where eclipse phase is defined as the period of time between HIV acquisition and first detectable viral load. Finally, combinations of these approaches have been organized into a statistical framework7.
Here, we introduce an approach applying such viral dynamics models with statistical inference on viral load data. Model development was achieved through fitting to longitudinally sampled viral loads from 46 participants in the RV217 study8. Population nonlinear mixed-effects (pNLME) modeling was used to determine the optimal model parameters for each individual, given a population distribution. We tested 30 candidate models informed by previous viral load modeling, and the best model was selected by parsimony. The very early moments of HIV infection are thought to be stochastic9, and have been modeled with stochastic viral dynamics10,11. Therefore, we used our best model in a stochastic formulation to simulate early HIV dynamics, allowing for fluctuations and extinction by chance. Together, the stochastic and deterministic models provide an estimate and associated uncertainty interval for the infection time of each individual in the study.
The pNLME modeling approach provides several advantages. By using a population model, it is possible to estimate infection times in individuals with sparse viral load data, including those without any measurements during viral upslope. Viral dynamics are not as sensitive to multiple founder infections as evolutionary methods. And finally, mechanistic models have been used to describe viral dynamics during broadly neutralizing antibodies therapies12, suggesting our methodology might be applicable to infection time estimation from emerging trial data including a therapeutic prevention modality.
Results
A framework for estimating infection time using viral dynamics
Using experimental data and modeling, we set out to develop a framework for estimating HIV infection time from viral load data. Using observed first positive viral load, we worked backwards toward infection time and defined several precise moments during HIV primary infection for modeling (Fig 1). HIV infection begins with an infecting exposure, the target time of estimation. Starting with this exposure event, there is a brief “black-box” phase encompassing biology not captured with past viral dynamic models. For example, the virus may need to diffuse or clear mucosal and anatomical barriers before beginning viral replication as described by mechanistic models. Animal challenge studies and human cases where infecting exposure is almost certainly known suggest this period is brief, from a few hours to 1 day9,13,14, but given the lack of information we note it here as a fundamental uncertainty and potential bias in our estimates.
Next, we assumed that viral kinetics can be described by a dynamic model unifying observable and unobservable viral loads. At this point, we assumed bottlenecking has resulted in at most a few infected cells starting a productive infection in the human body. Viral expansion from these few cells begins a stochastic phase lasting a duration defined as the stochastic phase, tsto. The stochastic phase likely encompasses the initial viral replication at the infection foci and the transition from a localized infection to an infection that has reached germinal centers. The stochastic phase ends when the viral load crosses a deterministic threshold. We estimated this threshold through repeated stochastic simulations finding the minimum viral load where the 1) the slope of stochastic viral loads were nearly log-linear and 2) there was effectively no chance of stochastic burn out. We determined that a value of 0.01 copy/mL sufficiently satisfied both criteria. The time between viral load crossing the deterministic threshold and reaching the first positive viral load observation was defined as the deterministic phase tdet. Finally, the time between infecting exposure and first positive viral load, comprising these three phases, we cumulatively refer to as t0. This time interval has been referred to as the eclipse phase5.
Experimental data for model development
We used viral load observations from the RV217 study8 including 46 individuals out of 155 total diagnosed acute HIV-1 infections in the study. Individuals had twice-weekly HIV tests before diagnosis using the APTIMA HIV-1 RNA Qualitative Assay (Hologic)—a fingerstick device testing small blood collection (0.5 mL). Once diagnosed (2 APTIMA positive visits), quantitative PCR was used to quantitate HIV RNA twice weekly in these individuals, who did not initiate antiretroviral treatment (ART) and had ~10 study visits in the first month after diagnosis. From this cohort, we assembled viral loads from Thai and Ugandan men, women and transgender individuals. Only individuals with more than 3 detectable longitudinal viral load observations were included. We found that in very early infection, APTIMA and viral load were strongly correlated (Fig 2A) and APTIMA measurements could be used to impute viral load at diagnosis times for individuals without measured viral loads (Fig 2B). Using this relationship, we imputed viral load at APTIMA diagnosis for 28 participants, which adjusted the first positive viral load by a few days. Several individuals were not diagnosed until later acute infection, meaning that peak and upslope of viral load are not obviously detected. We do not exclude these individuals, instead relying on our population modeling approach and borrowing strength across the cohort to make estimates. These estimates are particularly useful because such data sets provide significant challenges with other modalities.
Inference of tdet from a parsimonious model to the RV217 cohort data
The first step in estimating tdet was developing a model that best-described the observed data. Thus, we selected four distinct and previously applied mechanistic models of HIV primary infection and varied their population parameterizations (the number and type of parameters estimated). This resulted in a total of 30 models (see Supplementary Table 1). The four mechanistic models included the canonical viral dynamics model15, two models recently fit to SHIV/SIV viral dynamics16,17, and our own simplified model based upon Ref18. We found that the most parsimonious model to the RV217 cohort data (Supplementary figure 1 and Table 1) includes susceptible target cells (S) that are born and die naturally and virus (V) that infects these cells and creates productively infected cells that produce viable virus (I). Infected cell death rate depends on their own density powered by an exponent (h). This term semi-mechanistically encapsulates natural cytopathic cell death during viral production, as well as innate or acquired immunity against HIV infected cells that escalates as the number of infected cells increases (Fig 3A, see also Methods Eq. 1). In this way, an explicit immune effector compartment is not needed, and the model is simplified substantially.
The model output is congruent with previous data for other model compartments. For example, it predicts a susceptible cell drop between 40—80%19 (which may relate to the CD4+ T cell depletion during peak viremia20) and allows for the large observed inter-participant variation of viral peak (Supplementary figure 2A).
The best fit model for each individual is displayed in Fig 3B. We used population nonlinear mixed-effects (pNLME) modeling to estimate parameters, such that each individual has their own estimated parameters, but these estimates are constrained to be drawn from population distributions of each parameter; the population distribution is simultaneously estimated. All distributions of parameter estimates are shown in Fig 3C and values are quoted for each individual in Supplementary Table 2. From this model, the estimated time between deterministic threshold and first positive viral load, tdet, (Fig 1) ranged from 2.5—32.6 days across the 46 participants with a median of 10.1 days. We also verified that models with comparable AIC (<10 difference from the best model AIC score) predict similar individual values for tdet. Summary statistics of viral load (peak and set point) were not correlated with deterministic time tdet; rather they were strongly correlated with estimated infectivity (β), viral production rate (π) and the nonlinear death exponent (h) (Supplementary figure 2B). The magnitude of the first positive viral load was significantly, but not strongly, correlated with tdet (Supplementary figure 3). These results suggest other estimated parameters are mostly independent of infection timing and that the model predictions are informative beyond upslope regression-i.e. nonlinear estimation enhances our predictive power.
Stochastic simulations until the deterministic threshold
Evidence from modeling other viruses suggests that early stochastic events are linked to later deterministic kinetics21. For example, for cytomegalovirus (CMV) infection, extinction probabilities, duration, and magnitude of transient stochastic infections are consistent with primary infection mathematical model parameters22. Therefore, based on the individual best fit parameter sets, we performed stochastic simulations to determine the time-window between the introduction of a single infected cell and the deterministic threshold (tsto in Fig 1). Simulations were initialized with a single infected cell per μL 1(0) = 1 and at the viral free equilibrium between susceptible cell birth and death S(0) = vol × aS/δS. Scaling up to realistic volumes allows for a discretized stochastic simulation; vol was chosen to be 5 × 108 μL, or 5 L of blood (typical for adult human) at approximately 100-fold concentration based on the finding that the majority of lymphocytes reside in lymphoid tissues where infection is assumed to initiate before spilling over into blood9,23.
For each individual, the best fit parameters of the deterministic model were used to conduct 10 stochastic simulations via the tau-leap method24. Because HIV transmission is a rare per coital event25 and we are interested in infection time estimation, we conditioned upon successful infection10 by only using simulations from stochastic runs that did not go extinct. The simulations were halted when viral load crossed the deterministic threshold (0.01 copies/mL) and the time to reach that viral level (tsto) was recorded. Simulated viral loads from a single stochastic simulation of each individual are shown in Fig 4A. The distribution of stochastic times (tsto) is vizualized above the viral load panel, indicating a slightly asymmetric time to crossing the deterministic threshold with median ~5 days in this single stochastic simulation. There is substantial variability in the slope of these viral load trajectories based on the range of parameters inferred from the deterministic model for each individual. We also performed replicate simulations for single individuals (10 replicates for participant 10428 are shown in Fig 4B). In this case, viral load slopes are nearly identical by the time the deterministic threshold is crossed, but the early stochastic events introduce some variability in tsto. For this individual, the median time between infection and deterministic threshold was 5 days, with total range between 3-5 days in these 10 simulations. In summary, viral load upslope varies highly across subjects but minimally within-subjects. Variability introduced by the stochastic phase is predominantly a shift, rather than a scaling of infection time. This agrees with modeling of barcoded virus data early in infection (recently reported by Docken et al. during Dynamics & Evolution of HIV and Other Viruses 2020).
An important parameter for these simulations is the initial number of infected cells. We show estimates of tsto are inversely correlated with 1(0). For example, as 1(0) was increased from 1, 10, 100, to 1000, the median estimate of tsto across individuals decreased 5-1 days (Supplementary figure 4). As a result, this difficult-to-measure biological parameter only adjusts estimates by a few days.
Combining the stochastic and deterministic phases to estimate infection time
Next, we integrated the stochastic and deterministic timing estimates to complete the estimation of t0, the time between infecting exposure and first positive viral load, or the eclipse phase (Fig 1). Here as an example, we present this procedure for individual 10066 (Fig 5). First, we used the best fit parameters and performed 100 replicate stochastic simulations to estimate a distribution of tsto; the mean was approximately 6 days, and the distribution was skewed, with 95% uncertainty interval ranging between 4-9 days. Second, we drew values of tdet from a constructed conditional distribution using Markov-Chain Monte-Carlo given the population and random effect estimates of tdet (mean 10, 95% uncertainty interval between 7-13 days). The infection time, t0, was then calculated by drawing and summing tsto and tdet from their respective distributions. This was repeated 10000 times to generate an average t0 with associated 95% uncertainty interval (see estimates of tsto, tdet, and t0 for this individual in Fig 5). We estimated that this individual’s infection occurred 16 days prior to first positive viral load with 95% uncertainty interval ranging between 12 and 20 days. This procedure was performed for all individuals.
Direct comparison to previously applied infection timing estimation tools
Rolland et al. used several viral load and phylogenetic inference techniques to estimate infection times using the RV217 data3. These methods are the maximum slope of any two points on the upslope (max_slope), the best log-linear regression slope (linear_model), self-reported entries from trial participants (self_report), Bayesian phylogenetic inference of median time to most-recent common ancestor (BEAST)26, and Poisson fitter27 diversity estimator based on envelope sequences sampled at three time points in the first six months of infection. We compared our viral dynamics population non-linear mixed effects (pNLME) modeling approach estimations directly against all methods in Fig 6. In general, our deterministic estimates were in the same relative range of the other estimates. However, concordance correlation coefficients (CCC), which score how close data lie to the line y=x, are in general fairly weak (CCC<0.4) between pNLME and other methods (Fig 6A). This is driven by the fact that the complete estimator finds infection time earlier than most other methods, perhaps due to the additional stochastic phase. Adjusting the initial number of infected cells from 1 to 1000 (see Supplementary figure 4), or removing the stochastic phase decreases the average eclipse time, closer to previous estimates. However, we show correlation between pNLME with and without the stochastic component (final panel in Fig 6A) to illustrate this relationship is not necessarily linear.
We also compared all previous point estimates to one another (Fig 6B). No approaches were very strongly correlated by CCC. Hierarchical clustering of previous methods shows there are two distinct groups that include genetic estimators (BEAST, PFitter) and viral dynamic estimators (max-slope, linear_model). Self-report diary entries and our method (pNLME) fall roughly in between.
Wide applicability of pNLME to sparse data
The pNLME approach is widely applicable to data that challenge other methods. It can provide estimates for individuals for whom viral upslope is completely undetected. It also does not produce large outliers and never estimated the time of infection to be after first positive, as Max-slope, BEAST, and PFitter do in a few cases. pNLME also does not appear to be sensitive to multiple founder infections (which are particularly difficult for genetic estimators). For example, Rolland et al. identified some individuals in this cohort infected with multiple founder viruses based on the sequence analysis; for these infections, estimates of time to most recent common ancestor often gave estimates preceding the date of last negative test by many months (reflecting divergence in the transmitting partner rather than divergence after transmission)3. Infection with multiple founders has been associated with higher set-point viral loads28. Therefore, we tested to see if our model parameter estimates were different in single versus multi-founder infections (as differentiated by Rolland et al., see Supplementary figure 5). We observed no obvious patterns distinguishing single and multi-founder participants and found no significant differences among our parameters (Mann-Whitney p>0.1) but note the limited sample size with these data (n=9 multiple founders in this set). While beyond the scope of this paper, applying multi-founder status as a pNLME model covariate might admit more power given the small sample size. Importantly, the estimate of the time of theoretically crossing the detection limit was not affected by the distinction of multiple founders, meaning our estimates are robust to this challenge for phylogenetic inference.
Proof of concept study on synthetic data with realistic study protocols
To assess the accuracy of pNLME estimates, we performed a simulation study using several different sampling schemes. We simulated viral loads from 20 randomly chosen RV217 participants and sampled these viral loads with 5 different theoretical protocols. The first we refer to as “gold” meaning daily sampling before and after first positive. We refer to tight as weekly sampling visits, and sparse as monthly sampling visits (every 4 weeks). Infection was assumed to occur uniformly between study visits. If viral load was above 20 copies/mL at a visit that sample was called first positive (or diagnosis, dx), and measurements occurred subsequently. In Fig 7A we illustrate an example infection (red x), viral load (blue line) and observations (orange circles) for each protocol: “tight pre / tight post”, “tight pre / sparse post”, “sparse pre / tight post”, “sparse pre / sparse post”, and “gold”. We took these synthetic data observations and estimated t0 with pNLME. In this step, we used the RV217-trained model, meaning that we fixed the population distributions (as we would with new test data), and arrived at a new conditional distribution of individual parameters for each synthetic data set. We applied those parameters to the stochastic modeling step, completing the estimate of t0 on each synthetic data set.
Fig 7B shows the absolute error (days difference between truth from the synthetic data and inferred t0 from pNLME applied to those data) and the % error: (true-inferred)/true x 100%. Gold standard and tight/tight predictably admitted the lowest errors. Very few individuals were overestimated, meaning that when inference was incorrect, their infection time was usually closer to first positive than inferred. The exception to this occurred for some individuals with sparse post sampling.
All schemes other than gold had an obvious bias. Absolute and percent error was higher in individuals for whom true infection time was earlier. This means that uncertainty rises with estimation time farther from first positive. Put another way, our confidence decreases as the estimator projects farther into the past- an intuitively satisfying, albeit challenging finding. That this effect was fairly linear hints that it might be corrected. However, this may be an artifact of our synthetic data exercise, so we opted not to follow through with any correction. Rather, we focus on individuals who appear to have been infected within 20 days since first positive. For all sampling schemes, error on these estimates has a median of +/-10 days. A corollary of this finding is that sparse sampling after diagnosis was less detrimental than sparse sampling before diagnosis, because of the growing uncertainty with time and the likelihood of missing upslope, peak, and downslope.
This exercise illustrates one of the most practical applications for this method: estimating infection time in clinical trials of HIV pre-exposure prophylaxis agents. The “sparse/sparse” case represents a protocol comparable to that of the AMP (antibody mediated prevention) studies. In that study, visits occur every 4 weeks, and after first positive (week 0) visits occur at week 2, 4, 8, 12, and 24 weeks12,29,30. Thus, given the data generation distribution produced under our modeling assumptions, and the additional assumption that HIV dynamics in participants in the AMP study are comparable to participants in RV217, we expect our approach would provide reasonably accurate estimates for individuals who appear to have been infected within 20 days of first positive visit (95% uncertainty interval ~5 days), and less confident estimates for others. A secondary result of our modeling is that more sensitive detection could be crucial to avoiding the ultimately challenging case of individuals infected >4 weeks before first positive visit. APTIMA testing and viral load prediction as in Fig 2 would help this substantially.
Discussion
Estimating infection time is especially critical in HIV prevention trials. If drug levels at the precise time of infection can also be estimated then required drug levels for protection may be identified. Here, we have demonstrated estimation of HIV infection time using non-linear viral dynamics. Specifically, we assess the viral load trajectory, an established endpoint for HIV trials. We developed a two-step procedure: 1) using population non-linear mixed effects (pNLME) modeling to estimate parameters for individuals who were infected with HIV and then 2) using these same parameters to repeatedly simulate stochastic infections. In step one, we estimate the time between first detected positive viral load and the viral load reaching some level theoretically considered deterministic. In step two, we quantify the time of stochastic viral growth until the deterministic threshold. Combining steps 1 and 2 completes the estimate of the time between exposure/acquisition and viral load detectability, sometimes called the “eclipse time”.
We applied our method to data from the RV217 observational cohort, an acute HIV infection study with highly granular measures of viral load early during infection. For the first step we performed extensive model selection and found a mechanistic model that recapitulates viral loads in the RV217 trial from first positive until viral set point. A trained pNLME model using data from multiple individuals that includes observations during several stages of viral infection (e.g. expansion, peak and set point) allows confidence in parameter estimation from individuals who may not have data from all stages.
We compared our technique to other techniques that were applied to the same data set. We find that the individual level estimates are not concordant. On the population level, average values of our deterministic model agree with average values from other approaches. The additional stochastic phase in our model drives our estimates slightly farther from time of first positive, extending the range of the eclipse time. Concordance of our model is strongest (CCC=0.2-0.4) with other approaches that use viral load dynamics. Sequence-based approaches are the least concordant, and self-report diaries are somewhat middling. We note that the study group remains was too small to evaluate certain variables that differed across individuals, such as viral subtype, sex, age or ethnicity. In cases where viral dynamics and sequencing data exist, it may be optimal to try all approaches and developing uncertainty intervals extending across all methods. A future solution would be to include evolution into our mechanistic model and fit to both types of data.
Compared to other methods, our approach has several advantages. It allows estimation of infection time in individuals without well resolved viral upslope or even viral peaks. Specifically, without incorporating the population data, it is not possible to estimate infection time when viral upslope is missed. It is also relatively insensitive to founder multiplicity, a challenge for phylogenetic methods that sometimes results in unrealistic infection time estimates after the first positive viral load. The RV217 study is unlikely to be repeated. Thus, our model can be considered a trained model for future trials. Moreover, our synthetic data sampling study illustrates what such trials might look like.
While the true time of infection cannot be known other than in challenge experiments. We verified that our RV217-trained pNLME model works on simulated data (from the same mechanistic model). Even given a sparse sampling scheme (0,2,4,8,12,24 weeks after first positive) as in the antibody mediated prevention (AMP) studies, the approach generally works well for individuals for whom infection estimates are less than 20 days before first positive. This means that protocols sampling with ~2-3 week intervals typically have <20% error, or at worst 7 days off. However, our uncertainty grows for infections occurring further before first positive. This reflects the challenge of estimating data such as sparse/sparse in Fig 7A. Collecting only setpoint, or partial downslope means the estimate relies heavily on the population model, and with the heterogeneity of individuals, can be relatively error prone. Tighter sampling after first positive does not drastically improve accuracy. Indeed, for trial design, accuracy is better-enhanced by tighter sampling prior to diagnosis.
The largest challenge for the approach occurs when individuals are infected at a study visit (these occur every 4 weeks) but are not diagnosed because their viral loads are below detection, meaning that the first positive viral load will be not detected until >4 weeks after infection. This challenge is not unique to our approach, and we stress diagnostic-focused assays such as APTIMA to make diagnoses as soon as possible in situations where infection timing is crucial. Thus, we have shown that 1) it is possible to leverage and impute viral loads based on the finding that early APTIMA measurements are correlated to qPCR measurements, and 2) that borrowing strength using population modeling may be the best option to overcome sparse sampling.
There are several limitations to our study. It remains unknown, and will be extremely hard to test, whether early HIV dynamics can be described by the same mechanistic model as deterministic viral dynamics. However, in CMV transmission the probability of infection has been related to post-infection viral kinetics, suggesting stochastic behaviors may be linked to subsequent deterministic kinetics22. We speculate an early lag period in HIV infection that could be described by localized exposure and viral escape from anatomical barriers before initiating systemic infection. The duration of this period is unknown and we effectively assumed that it is negligible compared to the stochastic and deterministic phases, and compared to our window of estimation (i.e., less than a day). To account for this lag period, a further non-mechanistic window might also be added to reconcile wider estimates of eclipse time found in some studies6. Of note, it is not clear if any timing method can directly account for this period; for example, the founder sequence may describe the virion that escapes the early barriers in our schematic.
Our choice of the initial simulation conditions 1(0) inversely correlates with time of infection. That is, if we assume viral infection begins at a lower level, our estimates are further back in time. However, in what we consider to be a plausible range of initial conditions (ranging from 1 to 1000 infected cells initiating infection), the estimation varies by ~5 days. Interestingly, Rolland et al. found that if using a log-linear upslope modeling approach, a viral load of 1 copy/mL gave the best estimates in non-human primate infection where the date of infection was known perfectly3. One might therefore choose this value for the deterministic threshold, but the translation of this estimate is limited by the NHP experimental model, challenge virus species, and differences from viral load exposures in human transmission.
This approach should be generally applicable to other viruses. For example in Hepatitis C mechanistic models have been developed and some prior parameter estimates have been recorded31. Recently a similar method was applied to estimate the time of SARS-COV-2 infection32.
In future work, we plan to explore modifications due to preventative interventions, such that timing estimation, and therefore drug efficacy can be better estimated in upcoming clinical trials using broadly neutralizing antibodies.
Data Availability
All data and computational code are available from the authors and will be posted to github after acceptance.
Methods
Most parsimonious mathematical model
The set of ordinary differential equations for the model that is selected for this approach can be written as
The selected model contains 8 free parameters θ = (αS, δS, β, k, h, π, γ,tdet). The model we ultimately select is a slightly modified basic viral dynamics model that incorporates a nonlinear death term. The model tracks the concentration [cells mL-1] of HIV-susceptible cells S, infected cells I, and plasma viral load V [viral RNA copies mL-1]. The deterministic system is expressed (using the partial ∂t to denote derivative in time) with αS [cells μL-1 day-1] the constant growth rate of susceptible cells, δs [day-1] the death rate of susceptible cells, and β [μL virus-1 day-1] a mass-action viral infectivity. The viral production rate is π [virions cells-1 day-1], and γ [day-1] is the clearance rate of virus. The death and killing of infected cells is governed by the rate of κ [cells-h day-1], with the exponential factor h adjusting the nonlinear density dependent death rate. This approach coarsely approximates adaptive immunity such that higher numbers of infected cells engenders faster killing.
Population nonlinear mixed effects (pNLME) approach
We modeled the plasma viral load using a nonlinear mixed-effects approach (pNLME). In this approach an observed plasma viral load for individual i at time j is modeled as log10 yij = fV(tij, θi) + ϵV. Here, fV is the solution of the nonlinear mechanistic model for the variable describing the virus (V) given the individual parameter vector θi and is the measurement error for the logged viral load. We assumed that the individual-specific parameter θi is drawn from a probability distribution with median or fixed effects θpop and random effects , being Ω the variance-covariance matrix. Except otherwise specified we modeled parameters βj and πj as and remaining parameters as .
Model fitting
We explored four different mechanistic models with different statistical complexities, for a total of 30 models (See Supplementary Table 1 for details). For each model we obtained the Maximum Likelihood Estimation (MLE) of the measurement error standard deviation σv, the fixed effects vector θpop and the elements of matrix Ω using the Stochastic Approximation of the Expectation Maximization (SAEM) algorithm embedded in the Monolix software (www.lixoft.eu). We run the SAEM algorithm 15 times (assessments) for each model using randomly selected initial guesses for the parameters to estimate. For all model fits we assumed tij = 0 as the time of first positive viral load. However, we defined the initial value as the time -tdet when V(−tdet) = 0.01 copies/mL. We fixed other initial values as and cells/μL. Per Ref 31, we fixed parameter γ = 23 day-1. We estimated the remaining parameters of the mechanistic model including tdet. Individual parameters were selected using the mode of the conditional distribution constructed by the MCMC algorithm in the Monolix software. The conditional distribution of tdet for each individual is used to compute the time of infection t0.
Model selection
To determine the most parsimonious model we calculated the log-likelihood (log L) for all 15 assessments for each one of the 30 models. We then computed the Akaike Information Criteria for the model with highest likelihood among the 15 assessments , where m Is the number of parameters estimated). We assumed a model has similar support from the data if the difference between the AIC for its best assessment and the best one for the model with lowest AIC is less than two33.
Stochastic simulation scheme
We adapted the ordinary differential equation system Eq 1 to simulate stochastically12. Our implementation in Python, which employs the T-leap approach24, is publicly available. A time interval Δt = 0.0001 days is chosen for step size, in which a Poisson number of each transition type occurs. Initial conditions are changed to discrete values by multiplying by a volume. We choose this volume to be 108 μL based on the observation that there is approximately 1-10 L of blood in an adult human and that there are approximately 10-100 times more T cells in lymph tissue than blood. A single infected cell is assumed (other than in sensitivity analyses in Supplementary figure 4).
APTIMA analysis
APTIMA was the primary diagnostic assay in the RV217 study, of which 43 participants in our analysis were diagnosed by a positive APTIMA quantitative measurement. The remaining participants were diagnosed via a qualitative APTIMA response or directly with viral load. Among the 43 participants, only 6 had concurrent measurements of viral load for analysis. Comparing concurrent measurements APTIMA and viral load we found 1) substantial Pearson correlation between APTIMA and viral load at first positive viral load; and 2) diminished correlation later in the study at higher values of both measurements (Figure 2A).
Given the high correlation between the two measurements, we sought to investigate whether we could use APTIMA measurements at diagnosis to predict the unmeasured viral load for our model. This was accomplished using linear regression models predicting log first positive viral load with concurrent APTIMA as the predictor, evaluating both untransformed and log-transformed inputs (Supplementary figure 6A). One participant had an APTIMA measurement of 3, an outlier more than 2-fold lower than the next lowest value, and were removed from the model. To determine the appropriate upper range for APTIMA input for prediction, linear regression models were fit applying different upper bounds. Model performance was evaluated using residual mean square error (RMSE) predicting log viral load (Supplementary figure 6B). We found the best model used raw APTIMA measurements as the input with an upper bound of 34 (Supplementary figure 6B&C). This model was applied to the data to impute first positives for participants where APTIMA was measured for diagnosis without viral load (Supplementary figure 6D).
Supplementary figures and tables
Acknowledgements
DBR thanks Paul Edlefsen for motivating this work, as well as Raabya Rossenkhan, Philip Labuschange, Dobromir Dimitrov, James Moore, Holly Janes, and Yunda Huang for help and valuable conversations. DBR is supported by the Washington Research Foundation and an NIH K25.
Footnotes
Added some funding information and fixed a middle initial.