The natural history of TB disease-a synthesis of data to quantify progression and regression across the spectrum

Background: Prevalence surveys have found a substantial burden of subclinical (asymptomatic but infectious) TB, from which individuals can progress, regress or even persist in a chronic disease state. We aimed to quantify these pathways across the spectrum of TB disease. Methods: We created deterministic framework of TB disease with progression and regression between three states of pulmonary TB disease: minimal (non-infectious), subclinical, and clinical (symptomatic and infectious) disease. We estimated ranges for each parameter by considering all data from a systematic review in a Bayesian framework, enabling quantitative estimation of TB disease pathways. Findings: Twenty-four studies contributed data from 6030 individuals. Results suggested that, after five years, 24.7%(95% uncertainty interval, UI, 21.3%-28.6%) of individuals

with prevalent subclinical disease at baseline had either progressed to clinical disease or died from TB, whereas 16.1%(95%UI, 13.8%-18.5%) had recovered after regressing to minimal disease. Over the course of five years 30% (95%UI, 27.2%-32.6%) of the subclinial cohort never developed symptoms. For those with clinical disease at baseline, 39%(95%UI, 35.8%-41.9%) and 10.3%(95%UI, 8.5%-12.4%) had died or recovered from TB, with the remainder in, or undulating between, the three disease states. The ten-year mortality of people with untreated prevalent infectious disease was 38%.
Interpretation: Our results show that for people with subclinical disease, classic clinical disease is neither inevitable nor an irreversible outcome. As such, reliance on symptombased screening means a large proportion of people with infectious disease may never be detected.

Funding: TB Modelling and Analysis Consortium and European Research Council
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 16, 2021. ; https://doi.org/10.1101/2021.09.13.21263499 doi: medRxiv preprint

Evidence before this study
In recent years the existence of a spectrum of TB disease has been re-accepted. The classic paradigm of disease is one active state of symptomatic presentation with bacteriologically positive sputum, now referred to as clinical disease. Within the spectrum, a subclinical phase (where people do not report symptoms but have bacteriologically positive sputum) has been widely accepted, due to prevalence surveys using chest radiography screening in addition to symptom screening. On average these prevalence surveys have found around 50% of people with prevalent infectious TB had subclinical disease. There is also another state of minimal disease, or non-infectious disease, that is the earliest point on the disease spectrum after progression from infection. The likelihood or speed of natural progression, regression, or persistence of individuals across this spectrum remains unknown. As a consequence, the ability to accurately predict the impact of interventions has been limited. As individuals with bacteriologically-positive TB now receive treatment, contemporary data to inform the required transitions is highly limited. However, a large number of cohorts of patients were described in the pre-chemotherapy era. Until now, these data have not been synthesised to inform parameters to describe the natural history of TB disease.

Added value of this study
We synthesised data from historical and contemporary literature to explore the expected trajectories of individuals across the spectrum of TB disease. We considered a cohort of people with prevalent bacteriologically positive disease, with a 50/50 split of people with subclinical and clinical disease at baseline. We found that within five years, 13.3% of people recover from TB, with no chance of progressing to active disease without reinfection. However, we also find that 26.3% are still spending time infectious at the end of the five years. Our estimates for 10 year mortality and duration of symptoms before treatment aligned with the known and accepted values.
We also show that regression from subclinical disease results in a large reservoir of people with minimal disease, from which they can permanently recover, but can also progress again to subclinical disease. The undulating pathways that lead to regression and progression mean that 30% (27.2%-32.6%) of individuals with prevalent subclinical disease do not experience symptoms over the course of five years. This shows that clinical disease is neither a rapid, nor inevitable outcome of subclinical disease.

Implications of the available evidence
With these data-driven estimates of parameters, informed projections of the relative value of addressing minimal, subclinical, or clinical disease can now be provided. Given the known reservoir of prevalent subclinical disease and its contribution to transmission, efforts to diagnose and treat people with "earlier" stages of TB are likely to have a larger impact than strategies targeting clinical disease, particularly on individuals who never would have progressed to clinical disease.

Introduction
Despite effective treatment regimens being discovered in the 1950s, tuberculosis (TB) is still a major cause of morbidity and mortality globally. In 2019, there were an estimated 10 million people who fell ill with TB, and 1.4 million people died from TB. 1 The current paradigm of TB disease assumes that there is a single state of active disease, with only progression to active disease from infection. 1 In reality people can move in both directions across a spectrum of disease. 2,3 After infection, individuals are likely to progress through a state of minimal disease, where pathological changes due to Mycobacterium tuberculosis (Mtb) are visible on imaging techniques such as chest radiography (CXR) or computed tomography (CT), but individuals are not infectious (bacteriologically negative sputum). 4,5 Further progression leads to infectious disease (bacteriologically positive), within which there is a distinction between clinical and subclinical disease where individuals with subclinical disease do not report symptoms, but individuals with clinical disease report a prolonged cough or seek treatment due to their symptoms. 2,4,6 A recent review of national TB prevalence surveys found that around 50% of people with prevalent infectious disease have subclinical disease, and therefore will not be diagnosed by policies that rely on reported symptoms. 7 While it is likely that not all individuals with minimal or subclinical disease will progress to clinical disease, the range or relative significance of alternative disease pathways is effectively unknown. 5,8,9 Efforts to quantify these pathways have remained limited by the absence of directly applicable parameter estimates. 10-12 A comprehensive review of literature has shown that many data sources exist, both historical and contemporary, which observed cohorts transitioning across the spectrum of disease. 13 However, no single study provides the overview of all trajectories across the different states, and with studies having varying durations, follow-up structures, and approaches to define and report disease states, the resulting heterogeneity complicates a simple comprehensive analysis.
Here we use a Bayesian framework to synthesise all available data to inform estimates of the natural rate of transition between minimal, subclinical, and clinical TB disease. We use these rates to simulate disease pathways in individuals, which we categorise to compare the frequency of different disease pathways in the population.

Data
The systematic review collected data that describe untreated cohorts at a minimum of two time points. Each time point, the cohort disease state was reported with a combination of CXR, bacteriology, and symptoms, with all studies required to directly report bacteriology or use standards set by the National Tuberculosis Association (NTA) that include bacteriology in the definitions. 13,14 The first time point described the state of a baseline group, and the second (and further) time points described the states of a subgroup after a recorded time. To enable synthesised analysis, two study types were included. In time-to-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 16, 2021. ; https://doi.org/10.1101/2021.09.13.21263499 doi: medRxiv preprint event studies, individuals were closely followed and cumulatively recorded whether they had transitioned to a new state, either with a single, or multiple consecutive reporting points. After transitioning, individuals were excluded from follow-up. In cross-sectional studies, individuals were followed up at the single reported time point. Only their final state was recorded, without knowledge of any additional transitions that occurred before the study end. For inclusion in the analysis, a study needed to report on at least one cohort that transitioned between states, and included individuals needed to have, as a minimum, a CXR with signs interpreted as TB activity to fit in the minimal disease category. Detail on inclusion and exclusion criteria are included in the appendix (section S4).
Minimal disease was defined as bacteriologically negative, regardless of symptoms, based on observation from numerous studies that did not report differing progression rates and the poor specificity of symptoms in bacteriologically positive TB (see appendix section S3). 8, 15 We adjusted the cohort size for people starting with minimal disease based on using tuberculin skin tests (TST) as a proxy for radiography changes that were truly caused by Mtb infection (see appendix, section S5.4).
For classification of outcomes, we assumed that when symptoms were only reported at enrolment, the symptom status persisted over the course of the study. Where symptom status was unknown at both time points, we classified people with bacteriologically positive disease as "infectious" as they could not be differentiated by symptoms to split between subclinical and clinical disease. If a paper referenced the NTA standards and used disease terminology of arrested, quiescent, or active from these standards, we have interpreted these to mean minimal, subclinical, and clinical respectively (see appendix sections S1 and S4). 14 We used a fixed value for recovery (see appendix section S5.2). 13 For mortality from clinical disease we used the estimated rate based on empirical data for mortality from "open" TB, which has a similar definition to clinical disease (see appendix section S2) 14,16 In addition, we fixed the median duration of infectious disease to two years (see appendix, section S5.3). 17 With those parameters fixed, the data could then inform the remaining progression and regression parameters (see table 1 for values and priors).

Data synthesis
To bring the data together we created a deterministic framework of TB disease, including the potential to move between the three disease states, as well as recovery from minimal disease and death from clinical TB disease (figure 1, top row).
The transition rates were estimated by fitting to the data in a Bayesian framework. All data were considered simultaneously and a binomial distribution was chosen for the likelihood which allowed weighting by cohort size. Data points from time-to-event studies were down-weighted so that the multiple data points of a cohort contributed as a single study (see appendix section S5.1).
We sampled the posterior values using a sequential Markov chain Monte-Carlo method (MCMC). An initial burn-in phase was used to find an optimal acceptance level, of between 25 and 35%, which was achieved by adapting the proposal distributions in both shape and . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint scale. This was then discarded leaving chains with 10,000 iterations, which were visually inspected for convergence. The parameter estimates came directly from the output of these chains, including the distribution, median, and equal-tailed 95% uncertainty intervals.

TB disease pathways
To quantify the different pathways through disease, we applied the parameter values from the Bayesian fitting to a cohort model that tracked individuals through their TB disease history. Once recovered or treated, individuals exited the model. As we were interested in the natural trajectory of an existing disease episode, we did not include re-infection. Each cohort tracked 1,000 people over 10 years and was repeated 1,000 times to capture uncertainty. We considered three cohort types; subclinical cohorts where all individuals initially had subclinical disease, clinical cohorts where all individuals initially had clinical disease, and mixed cohorts where half the individuals had subclinical disease and the other half had clinical disease. 7 We ran the cohorts with and without the possibility of diagnosis and treatment. When included, treatment was implemented as a 70% annual chance of being diagnosed and successfully treated while symptomatic, as an approximation of a 70% case detection rate in a care system reliant on self-reported symptoms to initiate the care pathway.
We categorised the different pathways of disease observed over 12 month intervals. Where an individual received treatment, died, or recovered during those 12 months, they were classified as such. Individuals not classified as one of those outcomes could either have a static disease state or were classified as undulating (i.e. moving between two or more disease states). The cohort model reports disease state monthly, and if fewer than nine of the 12 months were spent in a single state, or an individual transitioned between states three or more times, the disease pathway was classified as undulating. Otherwise, nine months or more in a single disease state, and fewer than three transitions is classified as a static state, of the dominant state during that interval. See appendix section S6 for examples of these trajectories.
We report two durations of disease, one for infectious disease (subclinical and clinical), and one for all TB disease (minimal, subclinical, and clinical). The median duration of disease was calculated as the first point after the start of the simulation that fewer than 50% of the original cohort are present in one of the relevant states. We also recorded the number of months an individual spent with clinical disease before treatment or death, as well as throughout their disease episode, regardless of outcome.
Cumulative mortality from infectious TB disease in the absence of treatment was recorded at 10 years to allow comparison with existing estimates based on historical data. 18

Sensitivity analyses
To test the robustness of the data synthesis results, we explored the impact of removing data provided from each study one at a time. In addition, the rate of recovery, and the median duration of infectious disease were varied. For studies where symptoms were only provided in the start state of minimal, we re-ran the analysis with the transition for those . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Data synthesis
Twenty-four studies were included from the systematic review, providing 57 data points, describing 6030 people and 1053 transitions. These cohorts were followed for intervals between 1923 and 2004, with studies conducted in North America (7), Europe (7), Asia (8), and one each from South America and Africa. In total there were 5 data points from minimal to subclinical, 15 data points from minimal to clinical, 15 data points from clinical to minimal, 20 data points from minimal to infectious, and 2 data points from infectious to minimal. Figure 1 shows the data points, including the relative weight of each data point, as indicated by the error bars. The best fit and uncertainty intervals to the data are shown by the lines and shaded area respectively in each plot. These data and the fitting are described in more detail in the appendix, sections S4 and S5.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   Table 1 gives the median parameter estimate for each model transition, with the 95% uncertainty interval. Uncertainty intervals for the parameters reflect the restricted parameter space when considering all the data simultaneously. Regression parameters were consistently higher than progression parameters.       Looking at how many individuals with subclinical disease at baseline never develop clinical disease, of those who completely recover within five years, 74% (95%UI, 66.7%-80.8%) never developed symptoms. This drops to 45.6% (95%UI, 40.2%-50.3%) in those with minimal disease at the end of five years, and further still to 10.1% (95%UI, 5.3%-15.3%) for those with subclinical disease at the end of five years. In total, in a cohort of individuals with subclinical disease at baseline, we estimate that 30% (95%UI, 27.2%-32.6%) would never develop symptoms.

) minimal and subclinical, 2) subclinical and clinical, 3) minimal and clinical, 4) minimal and infectious. The middle column is a visual description of the transition being fitted on each row. The dots in each graph are the point values provided from each study, with the error bars representing the weighting of that point value as provided in the fit (see appendix section S5.1). The solid line represents the median trajectory of that transition with the cloud covering 95% of the simulated trajectories
In the absence of treatment, for a mixed cohort, the median duration of infectious disease (subclinical and clinical) was 23 months, which follows the range set during the data synthesis. If diagnosis and treatment were included, the median infectious period dropped to 15 months. If we consider all TB disease, including the minimal state from which individuals can progress to infectious disease, median durations of disease without and with treatment were 75 and 38 months respectively. Illustrative figures can be found in the appendix, section S8.
In a cohort with treatment available the duration of symptoms before death, regression to subclinical disease, or treatment varied between individuals from 1 months to 35 months, with a median of 4 months for each. For more on this distribution, see appendix section S7.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 16, 2021. ; https://doi.org/10.1101/2021.09.13.21263499 doi: medRxiv preprint

Sensitivity analyses
We found that no one study had a qualitative effect on either the estimated parameter values, or a selection of key results from the full analysis. Where changes in median parameter values were observed, these did not transfer through to the other key output metrics. In depth comparisons of the sensitivity analyses can be found in the appendix, section S9.

Summary of findings
We synthesized available data from a systematic review of untreated cohorts of TB disease to parameterise progression and regression between minimal, subclinical, and clinical TB disease. With these parameters we quantified the pathways of individuals across the spectrum. Our results show that non-linear disease trajectories are not only common, but actually the norm. This is demonstrated by the high proportion of individuals at five years who are undulating between states, and the high proportion of people with subclinical disease at baseline who never progress to clinical disease or even recover. Where symptom screening is used to detect people with infectious disease, many people may not be offered timely treatment, meaning either they progress to more severe disease or unknowingly contribute to further transmission of TB.

Interpretations
Of the four parameters estimated through fitting to the data, we found the regression rates (subclinical to minimal, and clinical to subclinical) exceed the respective progression rates (minimal to subclinical, and subclinical to clinical). While this could suggest that the majority of clinical TB disease resolves without intervention, the relative sizes of the states as well as the high mortality rate from clinical disease will likely counteract this trend. For example, given many more individuals become infected with Mtb than develop infectious disease, it is reasonable to assume that the population with minimal disease is much larger than those with clinical disease which could mean the absolute number of individuals progressing towards infectious disease will exceed those regressing. 1 This suggests new hypotheses about the population effect of treatment. For example, the rapid decline of prevalent infectious TB in China could be explained in part by the extensive treatment of individuals based on CXR rather than bacteriological diagnosis, which would have reduced the reservoir of minimal disease. 22 The uncertainty intervals around the four parameters are narrow, particularly the two between minimal and subclinical. This reflects the limited space left when all data points are considered simultaneously, as each new data point further restricts the potential values. In practice, progression and regression will vary more widely between individuals and between populations, driven by variations, such as HIV status, diabetes, malnutrition, and gender. 26 Our cohorts mostly comprised HIV negative people, and the prevalence of other variables was unknown (e.g. malnutrition). It is possible that these factors affect all . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 16, 2021. ; https://doi.org/10.1101/2021.09.13.21263499 doi: medRxiv preprint or a subset of parameters, but our results are based on a range of populations, times, and geography, and so provide an improvement over the limited data currently supporting estimates of TB progression and regression parameters. 21,27 Whilst the parameters were estimated by fitting to the data and a two year infectious disease duration, our model matched empirical estimates of cumulative 10-year mortality for all infectious TB of around 40%. 18,21 We also extracted duration of symptoms before treatment. Systematic reviews of self-reported sypmtoom duration usually cite between one and three months of symptoms prior to treatment, whereas we found a median of four months. 28 While marginally longer, our results are similar and self-reported symptoms are more likely under-than over-estimated.
We cannot directly compare our parameter estimates with current models, as no other models have split the disease spectrum into three states. Ku et al used the proportion subclinical in prevalence surveys to divide total duration of disease, and did not consider undulation between states 29 While their median estimated duration of symptoms is higher, our estimate of a median of 4 months falls within the range. 29 The WHO technical appendix includes regression from "active" disease for those who self-cure or die before treatment as a single parameter with no further consideration of a spectrum of disease. 30 Salvatore et al have represented disease as progression and regression along a single continuum of disease burden, defined as a composite of bacteriology, pathology, and symptoms. 10 The potential rates of progression and regression were wide and overlapped with our data driven estimates. 10 A recent systematic review from Menzies et al on progression only considered a single disease state. 27 Some of these studies split the active disease state by bacteriological load (smear positive or smear negative, whilst still bacteriologically positive), however we have instead focused on bacteriological positivity alone, in line with the current reporting framework. 1 We reported undulating disease based on a fixed threshold of nine months, which is a subjective choice. While a shift in threshold would change the proportion qualified as "undulating", the underlying movement between states will remain the same (see appendix, sections S6 and S9.2.3).
An important finding is the large proportion of people with subclinical disease who may never develop symptoms, i.e. clinical disease. Although many of these would naturally regress to having minimal disease, or even recover completely, this does not mitigate the time these people spend with subclinical, and hence infectious, disease or the time they spend with an active Mtb infection. As we show, in a population without treatment, the majority of people who had subclinical disease at baseline, still have TB disease five years later. Many of these have minimal disease and so have spent a long time with an active Mtb infection, but also there are many who have, or have had, subclinical disease or an undulating trajectory and so there may be a large amount of undetected transmission.

Limitations
Despite the extensive literature review, few data points could directly inform parameters. By including data on transitions between minimal and clinical, and between minimal and . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 16, 2021. ; https://doi.org/10.1101/2021.09.13.21263499 doi: medRxiv preprint infectious, we were able to restrict the likely parameter space. For example, the time-toevent minimal to infectious data provide a lower bound for the minimal to subclinical transition. While the chosen model structure will drive some of the results, limitations in the available data prohibit a more complicated model structure. In addition our three-state linear model structure is in line with historical and recent conceptualisations of the spectrum of TB disease. 2,3,7,12,14 Both our data and simulations start from prevalent disease (minimal, subclinical, and clinical) without knowledge of previous disease trajectory and a single rate of transition for all. As such the parameters represent a mix of both recent and more distal Mtb infections, where some individuals are rapidly progressing, as well as individuals who are undulating, or on their way to recovery. However this is a reflection of current prevalent TB states in a population, as found in prevalence surveys. 7,8 Prevalent disease is the immediate driver of TB morbidity, mortality, and transmission, and as such the population that TB policies look to address.

Conclusions
We estimate that around one third of people with active asymptomatic TB disease do not progress to symptomatic clinical disease. As such we show a flaw in the assumption that targeting clinical disease will enable care for all individuals suffering from TB disease, or interrupt transmission from infectious disease. Our work also highlights an important question; where should the threshold be set for TB disease that requires treatment. While the current threshold of infectious disease can relatively easily be confirmed, minimal (i.e. bacteriologically-negative) disease is an important reservoir of potential future transmission in the population and has a substantial risk of progression to more advanced disease. To comprehensively interrupt current and future transmission, we need to expand our interventions to those which can detect and treat subclinical and even minimal disease if possible. These may have substantial individual and population benefits which with these parameter estimates, we can now more reliably quantify. 9,12,31 Acknowldgements ASR, JCE, KCH and RMGJH received funding from the European Research Council (ERC) under the European Union's Horizon 2020 programme (Starting Grant Action Number 757699). KCH is supported by the UK FCDO (Leaving no-one behind: transforming gendered pathways to health for TB). This research has been partially funded by UK aid from the UK government (to KCH); however the views expressed do not necessarily reflect the UK government's official policies. HE, BS, received funding from the TB Modelling and Analysis Consortium (TB MAC) NM is funded by the Wellcome Trust (218261/Z/19/Z) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 16, 2021. ; https://doi.org/10.1101/2021.09.13.21263499 doi: medRxiv preprint