Model-based Bayesian inference of the ventilation distribution in patients with Cystic Fibrosis from multiple breath washout, with comparison to ventilation MRI

Background: Indices of ventilation heterogeneity (VH) from multiple breath washout (MBW) have been shown to correlate well with VH indices derived from hyperpolarised gas ventilation MRI. Here we report the prediction of ventilation distributions from MBW data using a mathematical model, and the comparison of these predictions with imaging data. Methods: We developed computer simulations of the ventilation distribution in the lungs to model MBW measurement with 3 parameters: {sigma}V determining the extent of VH; V0, the lung volume; and VD, the dead-space volume. These were inferred for each individual from supine MBW data recorded from 25 patients with cystic fibrosis (CF) using approximate Bayesian computation. The fitted models were used to predict the distribution of gas imaged by 3He ventilation MRI measurements collected from the same visit. Results: The MRI indices measured (I1/3, the fraction of pixels below one-third of the mean intensity and ICV, the coefficient of variation of pixel intensity) correlated strongly with those predicted by the MBW model fits (r=0.93,0.87 respectively). There was also good agreement between predicted and measured MRI indices (mean bias {+/-} limits of agreement: I1/3: 0.002 {+/-} 0.112 and ICV:-0.001 {+/-} 0.293). Fitted model parameters were robust to truncation of MBW data. Conclusion: We have shown that the ventilation distribution in the lung can be inferred from an MBW signal, and verified this using ventilation MRI. The Bayesian method employed extracts this information with fewer breath cycles than required for LCI, reducing acquisition time required, and gives uncertainty bounds, which are important for clinical decision making.


Introduction
Ventilation heterogeneity (VH) refers to the unevenness of inspired air distribution in different lung regions during breathing. It is an early and prominent feature of lung diseases such as cystic fibrosis (CF), bronchiectasis, asthma and COPD [1][2][3][4] . Clinical assessment of VH in the lung is performed by the multiple breath washout test (MBW), from which the most commonly used primary outcome is the lung clearance index (LCI) 5 . LCI is now well established, particularly in CF where it is a sensitive, robust measure of early disease, and responsive to clinical status 6 . However, LCI utilises a relatively small proportion of the gas washout data collected: the alveolar gas concentrations, and specifically those preceding the first and last washout breaths. Alternative VH indices from MBW, such as moment ratios 7 and phase-III slopes 8 , have been developed but are less commonly used as they are more sensitive than LCI to other factors besides ventilation heterogeneity 5,9 , including variations in tidal volume and gas diffusivity. One current limitation to widespread clinical use of LCI as a pulmonary function test is the time required for data collection 5 . Although LCI appears robust to earlier test thresholds to some extent, this may be at the cost of reduced sensitivity and the practice is not widespread [10][11][12][13] .
Hyperpolarised gas ventilation MRI allows for visualisation of the ventilation distribution in the lung 14 . Patients inhale a bolus of hyperpolarised tracer gas ( 129 Xe or 3 He) which is imaged during a short breath hold (as in this study), dynamically during the respiratory cycle 15 or over several cycles 16 . This enables the identification of small ventilation defects that are not detected by spirometry (FEV1) or LCI, and the method is therefore highly sensitive to early lung disease progression 1,17,18 . Ventilation MRI also provides a 3dimensional representation of the distribution of lung disease, allowing regional changes to be identified and tracked. Furthermore, MRI can identify lung regions where tidal flow is entirely obstructed, which would not normally contribute to the MBW signal. This powerful technique is therefore potentially more informative about the nature and severity of lung disease. We have previously reported a good correlation between MRI markers of airway obstruction and LCI in patients with CF 19 . The availability of this paired data, previously published and analysed in 17,19 , presents a unique opportunity to link clinically usable measures of gas washout with detailed lung imaging in order to better inform understanding of MBW and to develop more sophisticated washout metrics.
Mathematical models have been used previously to predict the lung ventilation distribution from inert gas washout tests. Continuous models [20][21][22] used functional fitting of end-tidal nitrogen, but were constrained by the requirement for equal breath sizes during the test. The widely-employed method of Lewis et al. 22 involving 50-compartments with specific-ventilation equally spaced on a log-scale. The volume of each compartment is fitted using a LASSO-type (least absolute shrinkage and selection operator) approach, with an empirical smoothing term between neighbouring compartments. An initial estimate is calculated by linear least-squares fit, then a local gradient descent method is deployed with the constraint that no compartments have negative volume. The multiple inert gas elimination technique 23 (MIGET) involves an estimation of the ventilation distribution using similar methods to those originally proposed by Lewis et al. but with several measurement gases. Similarly, Prisk et al. used the Lewis method to measure the effects of micro-gravity on VH 24 . Two-compartment models have also been developed 25 , which allow all parameters to be identifiable without smoothing assumptions, but lack the resolution to recover a realistic ventilation distribution.
More recent algorithms have refined these approaches through a variety of techniques, including systematic variation in the number of compartments and model assumptions 26 , and the introduction of dead-space compartments and acinar-mixing effects 27 . A more detailed compartmental model of ventilation has also been developed that accounts for CO2 and N2 exchange 28 to better estimate unventilated space. However, this particular model is currently too computationally intensive for implementation with Bayesian fitting methods (and thus lacks robust uncertainty bounds for the parameters), and was developed for a bespoke high-precision device. Furthermore, Bayesian inference has shown promise in extracting more useful indices from MBW data using only the end-tidal concentrations 29 .
While several studies have shown that washout curves can be predicted from structurefunction measurements including imaging [30][31][32] , and the model of Lewis et al. has been refined through comparison with benchtop experiments 33 , no studies have yet validated predictions of the ventilation distribution in a clinical setting using direct imaging measurement in the same patients.
The software we have developed addresses the identified shortcomings. First, the algorithm can be used on directly-measured tracer gas and flow data from the InnocorÔ SF6 device (Innovision, Glamsbjerg, Denmark) 34 , which has already been used in several studies, and can applied to measurements from other devices. Secondly, we have used Bayesian parameter identification to fit the compartmental model to data for individual patients, allowing for robust prediction estimates of uncertainty based on approximating the probability of all plausible solutions. Finally, we have also used independent measurements of VH from ventilation MRI to directly test the validity of the inferred parameters. This makes the physiological assumption that the heterogeneity of the hyperpolarised gas distribution is ventilation-dominated, and only negligibly affected by collateral diffusive mixing.
The aim of this study was to develop computational software to predict the ventilation distribution in individual subjects using raw MBW data and to use this to produce robust indices of VH that directly correspond to those measured directly with ventilation MRI. A secondary aim was to test whether the measures derived using this new method were robust at shorter test times.

Methods
Tables 1, 2, and 3 introduce and the mathematical symbols that are used in this section prior to their first use in the text. Table 1 shows the subscript indices used in the modelling and data analysis and so represent integer quantities. Table 2 shows quantities associated with the MBW and MRI data processing, while Table 3 shows simulation parameters and variables of the compartmental lung model,

Index
Use Lung units and dead space compartments in the compartmental lung model. Volume elements in the compartmental lung model. Breath number in processed MBW data. Time-step in processed MBW data. Voxel in MRI data (simulated or observed).

Study design and recruitment
As part of a longitudinal observational study, children and adults with CF were recruited from three UK specialist centres 18,19 . At recruitment, patients had to be over the age of five years, be clinically stable for four weeks prior to their visit and achieve an FEV1 >30% predicted within the previous six months. This study was approved by the Yorkshire and Humber -Leeds West Research Ethics Committee (REC reference: 16/YH/0339). Parents/guardians of children and all adult patients provided written informed consent. For this analysis we have used data from a single visit when patients completed both MRI and MBW, including supine MBW (since this corresponds to the same position as the MRI procedure, and so minimises the effect of changes due to body position or gravity 35,36 ).
All of the experimental data in this paper is from a single visit of the studies 17,19 , and therefore an extensive analysis of the MBW and MRI data has been published previously.

%/'
Proportion of the masked He 3 MRI image with voxel brightness less than 1/3 of the mean.
)* Coefficient of variation in brightness of masked He 3 MRI image. LCI + Lung clearance index measured by MBW (mean from 3 test repeats), = 2.5, 5, 10, 20, 40 is the termination threshold used in % of equilibrium SF6 concentration.
,./ Fowler dead-space (L) measured from CO2 curve during MBW (mean from 3 test repeats -each test is the median value of all washout breaths).
Functional residual capacity (L) measured by plethysmography @ # , @ $ Initial guess for model parameters # and $ , used to set model priors.

Ventilation MRI
Ventilation MRI was performed on a 1.5T GE HDx scanner (GE, Milwaukee, WI, USA) using hyperpolarised helium-3 ( 3 He) using a transmit-receive vest coil (CMRS, Milwaukee, WI, USA) and a three-dimensional (3D) steady state free precession ventilation imaging sequence as described previously 16 . Images used in this study were acquired during a breath-hold at end-inspiratory tidal volume following the inhalation of a predetermined fixed volume of test gas from their resting functional residual capacity (FRC). The volume of gas was titrated based on the subject's height and consisted of scaled doses of 3 He balanced with nitrogen 19 .
Contiguous ventilation MR images of the coronal plane were acquired with slice thickness of 5mm and pixel size ranging from 2.73 x 2.73mm to 3.28 x 3.28 mm (depending on patient lung size). For each slice, a mask was manually derived from 1 H images (acquired during the same breath hold as the 3 He MRI) to determine which pixels corresponded to positions inside the lung cavity (excluding visible airways). The masked images were eroded by one pixel to avoid edge effects. The intensity (brightness) of each pixel is taken as a relative measure of the gas concentration at that point. This is used as a proxy for the local ventilation rate, and is used to characterise VH in two ways (see figure 1): • Measuring the 'poorly ventilated' fraction of the lung by %/' . That is, the fraction of pixels with intensity less than one-third of the mean (as used previously to define ventilation defects 37 ). • Calculating the total coefficient of variation of the normalised pixel intensity, )* . Note that both of these indices are measures of global ventilation heterogeneity, and we do not use more common indices such as <3=9 38 or ∆ 39 that quantify spatial heterogeneity, or any of the information regarding regional ventilation patterns. This is because the aim of this study is to recover measures of global VH from MBW data, and to use MR to verify these.
Since the MBW data is collected at the mouth, it is a mixed signal from all of the lung regions, and hence distinguishing regional contributions is practically impossible unless washouts are performed in a range of postures 40 . Extending this method to account for posture changes and regional ventilation differences is a promising area of future research.

Pulmonary function tests
MBW was performed using a modified open-circuit Innocor device using 0.2% sulphur hexafluoride (SF6) trace gas in air. MBW tests were collected in triplicate in the supine position. Spirometry and body plethysmography were performed to international standards 41,42 using a PFT Pro (Vyaire, Basingstoke, UK). All tests were performed on the same day. Either MBW or MRI was performed first, followed by the other. Spirometry was always performed last to minimise the influence of VH redistribution due to a forced manoeuvre.

Symbols
Description Compartmental lung model parameters setting the FRC, deadspace volume and VH respectively.
Properties of the volume element j in dead-space compartment : gas volume, inert gas concentrations (distal end and proximal end), and inert gas volumes respectively.
Properties of the lung unit : inert gas concentration, gas volume and inert gas volume respectively. Re-discretisation volume used for mixing-step ( <=: = used throughout) is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  Figure 2 provides schematic overview of the model-fitting process, which this section describes in detail.

Processing of MBW data for model fitting:
The MBW data we have used was collected using the Innocor SF G device (Innovision, Odense, Denmark) and converted to plain-text format. Processing of the raw data was carried out using a custom-built C++ program in line with recommendations 5 . Corrections for re-breathed SF6 were not applied as these are simulated in the model. In summary: 1. Gas traces from each MBW are corrected for the flow-gas delay (as measured during device calibration), which is taken to be the same for the CO H and SF G datasets.
2. The data are separated into inhalations and exhalations, with a filtering step to ensure that flow fluctuations near to zero are not counted as separate breaths. 5. Finally, exhalations are down-sampled to 10 points per breath: these were the start and end points, 4 evenly spaced points in phase II (defined as exhaled volume between K $ /2 and 3 K $ /2), and 4 in phase III (defined as between 3 K $ /2 and 0.95 7 (IJK) ), where 7 (IJK) is the total exhalation volume of breath . This downsampling was achieved by linearly interpolating from the nearest concentration measurements in the dataset. Inhalations were down-sampled to a single step between inhalations, as the gas trace is approximately zero during washout and therefore not used in the fitting process. We label the time-steps with index and the associated volume change as Δ L ′. 27 , the computational lung model comprises F = 50 lung units of equal size with total volume # at FRC, with each compartment connected to independent dead-space compartments of equal size with total volume $ . As in Mountain et al. 28 , the relative inflation rate of each compartment is drawn from a lognormal probability distribution with unit mean

Compartmental lung model: As in Bates and Peters
where ! is the VH parameter. Once the values have been drawn ( B~ ( ) for = 1, . . , F ), they are normalised so that the mean is ̅ = 1. The model simulates advection of the inhaled/exhaled volumes measured from MBW through the dead-space and into/out of the lung units via the transport of discrete volume elements. These volume elements therefore effectively form a Lagrangian grid for the 1D network of dead-space components. Each element is indexed sequentially from distal to proximal end in each dead-space . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 1, 2021. ; compartment = 0, … F (where = 0 is the common-dead space and = 1, . . , F the private dead-space compartments corresponding to the acinar units with the same index). An element has volume BC , and concentration values BC (D) and BC (E) defined at its proximal (mouth) and distal (acinar) ends respectively. The elements are assumed linear in concentration and so have inert gas volume BC = BC ( BC (D) + BC (E) )/2. Each acinar unit exhibits instantaneous gas-mixing, so the gas concentration in the unit at time-step is where B is the volume of inert gas in the unit, and B is the total gas volume of the unit. Synchronous ventilation of the units is also assumed.
At each time-step , the volume flux is perturbed by random measurement noise such that  Following MBW simulations, the same model is used to simulate hyperpolarised gas MRI measurement. A single inhalation step is simulated of size P=Q (corresponding to the volume of the bag of ' He and N H mixture used in the MRI imaging) with unit concentration. We assume that the majority of dead-space has been successfully excluded by the airway imaging mask, and take a volume-weighted sample of the concentration in the lung units for 1000 representative datapoints, such that and R~( 0,1) for = 1, … ,1000. This means that average number of samples taken of unit is proportional to the fraction of volume this unit takes up at endinspiration. The measurement noise ~(0, ; ) is chosen to equate to a signal-to-noise ratio of 50 such that ; = 0.02 ∑ BS% . Finally, the measurements are normalised to have unit mean (since absolute pixel intensity values are meaningless), so This model has a number of shared features with previous methods for similar applications, as cited above. It is a stochastic model so simulations with the same distribution parameter ( A ), will have slightly different ventilation distributions, as in Mountain et al. 28 . The number of compartments in the model ( F ) then dictates how similar an individual realisation of the ventilation distribution is to the lognormal distribution used to generate it. As F is decreased, extra uncertainty around the parameter A is introduced, since the correlation between A and the generated ventilation distribution is weakened, and this can result in greater numbers of simulations being rejected during the ABC algorithm, making the fitting process less efficient. Furthermore, reducing the number of compartments also results in washout curves that are less smooth, as the concentration jumps up as gas from different compartments reach the mouth, which can result in worse model fits to the data. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  44 was used for each individual's data. The flow data from MBW is taken as a model input, and the SF6 concentration is the output to be fitted against. Parameters sets are drawn and evaluated to build a (posterior) probability distribution for their actual values through an iterative refinement process (identical to the procedure in Toni et al. 2009 44 (algorithm S)). The algorithm is given in Appendix B.
The parameter prior distributions we assumed uniform and independent as: The range bounds are defined using their estimated quantities from MBW processing (see 2.4.1) where K # is the estimated FRC (mean of three tests) and K $ the test median Fowler dead-space (mean of three tests), whereas ! is given a broad plausible range.
The posterior parameter sets are used to predict the alveolar ventilation distribution measured by imaging (as detailed in the previous section) and compared directly to those measurements in the same patient. The final number of iterations required in the ABC-SMC algorithm was set adaptively (see Appendix B for full details).

Statistical Analysis
Data were analysed in Matlab (v. R2017a) 45 . The Pearson correlation was used to quantify correlation between measured and predicted values of the same quantities. Agreement of these values was measured by Bland-Altman analysis. Variability in MBW derived indices ( >?( and LCI) was quantified by the standard deviation over 3 tests. To compare measures of different properties we have used Spearman's rank correlation coefficient , these are labelled as such in the figures. A p-value of <0.05 was considered statistically significant.
Receiver operator characteristic (AUCROC) analysis was used to compare the reproducibility of different indices in clustering/grouping patients. Since this dataset does not contain healthy volunteers, we split the patients into sub-groups based on severity of VH as measured by the imaging indices %/' and )* . This was performed using a k-means clustering algorithm on the two indices. The optimum number of clusters was found using the gap statistic.

Parameter identifiability and convergence of the fitting model
We varied the discretisation parameters ;; , ;;; and F to test for convergence of the inferred parameters to those used to generate artificial data from a single model realisation.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 1, 2021. ; https://doi.org/10.1101/2021.10.01.21264402 doi: medRxiv preprint (https://doi.org/10.5281/zenodo.5497647) shows the results of fitting models with varying F to a model with F = 500. We see that F = 50 recovers the parameters well from data generated using the model with 500 compartments. We also found that model run-time scales linearly with number of compartments.
Using the values of ;; , ;;; and F from tables 2 and 3, we generated and processed artificial data using models with a range of parameters. We then used the ABC-SMC to recover the model parameters, as show in figure 4. We see that, for the full range of ! tested (up to 1.5), this heterogeneity parameter is recovered well (within the uncertainty limits). However, we observe that the volume parameters are poorly estimated at very high VH. This is primarily due to a large portion of the ventilation distribution not contributing much to the washout (because it rarely clears the dead-space). This leads to greater uncertainty because, above ! ~1 solutions with larger FRC and larger VH, or smaller FRC and smaller VH, can have similar fitness. In gradient-descent fitting methods, the chosen fit is therefore likely to be sensitive to measurement error and initial guess for these cases, a major benefit of the Bayesian method here is that this uncertainty is quantified.

Patient population
Paired MRI and LCI data were available for 25 children and adults with CF, with FEV1 z-scores ranging from -5.32 to 1.10. Table 4 shows the patient demographics and lung function of all subjects. In the original study, the patient population was sub-divided by clinical status, which included LCI as a determinant. In this study, our aim is to compare how well the model recovers MRI observed VH, and compare its predictive value to LCI. Therefore, we construct patient clusters based on MRI measurements alone, in order to compare AUC -U) measures for both the model fits and LCI. Thus, patients were divided into 2 clusters based on %/' and )* as outlined in section 2.5, the results are shown in Table 1. Note that there is also a strongly significant difference in the LCI and FEV1 scores between the two groups, indicating that %/' and )* are consistent with these indicators of VH. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint   Figure S2(c) shows that there is strong correlation between the VH parameter ! and LCI measured by MBW. In figure S2(a), it is also shown that model inferred FRC values ( # ) are better correlated and agree more consistently with plethysmography measurement of FRC than MBW measured FRC volumes ( >?( ) do (r-value = 0.93 vs. 0.83 and mean bias −1.21 ± 0.93L vs. −1.26 ± 1.47L, respectively).  Figure S4(a) shows that increased ventilation heterogeneity ! results in an increase of the fitted FRC volume ( # ) relative to the value directly computed by plethysmography, while the exact opposite effect is seen for MBW vs. plethysmography measured FRC. Similarly, figure S4(b) shows that the model overestimates dead-space for large VH cases. Supplementary figure S4(c) shows that the uncertainty in predictions increases with VH. All of the effects are inter-related, as explained in the discussion.

Prediction of physiological parameters and MRI measured ventilation distribution from MBW data
Nonetheless, in the majority of cases, the imaged ventilation distributions show good agreement with those predicted by the model (Figure 6). These fits are quantified in Supplementary Table S1 (https://doi.org/10.5281/zenodo.5497647), where it is shown that they fit the ventilation distributions better than a single parameter distribution fit directly to the imaging data.

Impact of shorter washout
To measure the effect of shortening the MBW test time, we computed the LCI and model fits for increasing MBW termination thresholds (retaining the same model priors for # and $ ). Figure 7 shows that LCI values correlate much less strongly with LCI2.5 values as the threshold is increased, however the model parameter σ * remains practically unchanged up to the 20% threshold, and still correlates strongly with its initial values even at 40% termination threshold (r=0.92 for σ * vs. r=0.69 for LCI). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 1, 2021. ; Table 5 shows that both the LCI and model predictions are good at classifying subjects into the 2 clusters as MBW test threshold limits are changed. At low threshold values, LCI2.5 was an ideal classifier for this dataset (AUCROC = 1). There was a similar decline in the AUCROC for all measures of VH as the termination threshold was raised. In this case, the model parameter ! was the best classifier when the termination threshold was raised to 20% or above.

Discussion
We have developed a method to predict the ventilation distribution in the lungs of people with CF from MBW data. The results show good agreement with the distribution measured by hyper-polarised gas ventilation MRI in the same cohort (figure 5). The shape of the ventilation distribution was also well characterised in the majority of cases ( figure 6 and Supplementary table S1 https://doi.org/10.5281/zenodo.5497647). The parameters generated are consistent with those measured directly by MBW and body plethysmography (Supplementary figure S3 https://doi.org/10.5281/zenodo.5497647), reinforcing that the algorithms work as expected. The simulation method introduced is an efficient and flexible representation of ventilation heterogeneity, that is faster than similar models 28 . This has enabled the use of Approximate Bayesian Computation, which has in turn allowed us to generate confidence intervals around the predicted outcomes, as well more consistent and robust inferences of the model parameters.
This model has used to infer the distribution of ventilation from the MBW data of CF patients. While similar models have been used previously to infer the distribution of ventilation for inert-gas washout, they have not been compared directly to VH measurements from imaging. The measure we label )* has been used to quantify VH in a number of previous studies, while %/' represents the poorly ventilated regions of the lung. This method therefore makes the MBW and MRI results more directly comparable.
In addition, we have shown that the model predictions of VH remained consistent as the test-data was truncated, remaining more strongly correlated with the predictions from the full dataset than LCI ( figure 7). This demonstrates that we can extract sufficient information about VH from the early breaths of washout curves to characterise the full ventilation distribution. It should be noted that the prior distributions of the model parameters were still . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 1, 2021. ; generated using estimated FRC from the full washout ( >?( ), so the model predictions are not entirely independent of classic MBW indices. However, closed-circuit wash-in can be used to calculate more accurate priors for FRC during wash-in 46 . Thus, combining our model with closed-circuit wash-in could greatly reduce overall test time while retaining the sensitivity to VH. Even in the case of nitrogen washout, the shortened washout time would also allow a quicker recovery between tests.
Measured LCI remained relatively robust despite increasing the end point threshold (table 5). Other studies have shown that LCI specificity and sensitivity decreases with increasing termination threshold 10 . The LCI sensitivity and specificity are much higher in this cohort since the clustering is based imaging measurements of VH rather than clinical status. Another reason is that, for a threshold of 2.5%, there is a large gap between the largest LCI value in cluster 1 and the smallest LCI value in cluster 2 (even though the clusters were based on %/' and )* ). This is an idiosyncrasy of this small dataset that means that even though LCI40 is only moderately correlated with %/' and )* (r =0.66, r=0.63 respectively), the AUCROC remains high (0.854). The model predicted indices correlate better at the 40% threshold (r=0.85 and r=0.75 for model predicted vs. MRI-measured values of %/' and )* respectively) and have better AUCROC values (see table 5). Therefore, although it appears that the model predictions are no more sensitive than LCI for a full washout, they retain their predictive ability better when washout is truncated. Furthermore, this method incorporates the interdependence of the inferred FRC, dead-space and ventilation heterogeneity, resulting in improved FRC volume predictions from the model vs. MBW (Supplementary figure S3(a) https://doi.org/10.5281/zenodo.5497647). Further testing in larger datasets and refinements of the model may be able to improve performance and provide additional useful insights.
The main innovation in the compartmental model used here is its flexibility and efficiency. Unlike older fitting methods 21,22,25 , we use multiple points per breath to characterise the washout curve, meaning we can accurately fit dead-space volume as well as ventilation heterogeneity. However, unlike previous models 27, 28 , we discretise exhalations into large time-steps (10 per exhalation), to increase the model efficiency. Also, unlike most older methods, but similar to Mountain et al. 28 we use the actual flow rates measured from MBW to run the simulations, which means that the model is accurate even when tidal volumes fluctuate between breaths (which is common in young subjects or those who have trouble regulating their breath volume). Finally, the method is implemented in a modular fashion using object-oriented C++. This means that adapted models can be built on top of the existing structure with relative ease. Examples of this are given in the Github repository 47 .
The model also has some limitations. First, it assumes that gas transport to the lung units occurs in parallel, which is not entirely true in the branching airway network. This was chosen because we and others have found that a "common" dead-space parameter was not always identifiable in this model 28 . Second, diffusion in the alveolar region was modelled here as instantaneous. Part of the mechanism to generate phase-III slopes 48 , particularly for gases with high molecular weight such as SF6, was therefore missed. In particular, =V89 effects appear to impact on the model performance, with patients with higher =V89 values 4 showing worse fits to both the MBW trace and the MRI distributions (Supplementary Figure  S5 https://doi.org/10.5281/zenodo.5497647). Future improvements of the model will be aimed at addressing these issues. Third, it is implicit in the model that the ventilation distribution is the same on inhalation and exhalation, regardless of breath volume or flow rate. These factors both affect airway closure and reopening, which may explain some of the discrepancy between predictions and outcomes. Fourth, the model in its current form is . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 1, 2021. ; designed to model inert exogenous gases, such as SF6, extensions will be required to simulate exchange of N2 or CO2 as measured by certain MBW devices. Finally, an inherent difficulty of parameter identifiability when VH is large enough to mean that some lung regions have very low specific ventilation, and it appears that the actual size of this unventilated region is occasionally poorly accounted for in this model (see Supplementary figure S7 https://doi.org/10.5281/zenodo.5497647 for a detailed example). This leads to poorer estimates of the lung volumes (both FRC and dead-space) and greater parameter uncertainty when VH is large, as seen in figure 4. Related to this is the assumption of an underlying continuous distribution that is used to generate the discrete ventilation distribution in the model (in this case lognormal), which places a restrictive prior on the shapes of distribution that can be predicted (e.g. multi-modal distributions are much less likely to be predicted).
Notwithstanding these limitations, the concept of this approach has been justified. Future refinement will be required to address the issues raised around including more realistic acinar mixing effects and lung mechanics. To achieve this in a computationally efficient way, we may need to employ new approaches to dimensionality reduction 49 and homogenisation 50 of acinar and airway transport simulations. The Bayesian framework laid out here will also aid in these developments, since Bayesian model selection 44 (which the algorithm 47 is programmed to perform) can be used to compare the ability of different models to adequately explain the data independently of external verification.
In summary, we have developed a tool that uses an individual's MBW test data to predict ventilation heterogeneity in their lungs and for the first time have been able to corroborate these distributions with regionally resolved ventilation imaging. This will enable clearer interpretation of clinical data, more direct comparison between ventilation imaging and MBW data, and help to enable reductions in test time that are required to improve clinical practicality. Furthermore, it is the first example of a physiological model fitted to patient washout data using Bayesian parameter estimation, which will provide clinicians with all important estimates of uncertainty in physiological inferences.
1. A volume element with ## = L is added to the top of the common dead-space, with ## (D) = ## (E) = 0.
2. Volume elements are removed from the common dead-space, starting with the distal-most element ILE up to, but not including, ′ such that ∑ 6. The new elements created in step 5 are sequentially added to the distal end of the common dead-space. 7. Elements with total volume − L are removed from the proximal end of the common dead-space using the same method as for the private dead-spaces in steps 3 and 4 and discarded.

Appendix B: ABC-SMC algorithm
We define as the current `generation' index, where a generation of simulations in one iteration of the simulation acceptance algorithm, the maximum number of generations before the algorithm terminates is a . Within a generation, the accepted simulations are indexed , and once X have been accepted, that generation of simulations is completed. We use the shorthand = { # , $ , ! } to represent a parameter set, with a (X) representing the th accepted parameter set in generation . The model prior probability distributions (henceforth "priors") are labelled ( ), while a ( | * ) is the perturbation kernel, ( , ) is the distance function, and a the distance threshold for generation .
The number of accepted simulations required per generation was set to X = 1120, and the number of iterative generations a was determined adaptively, such that the algorithm terminated when at least 56,000 simulations (i.e. 1 in 50 acceptance rate) were required in the last generation. This termination threshold was implemented after finding, in a test . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 1, 2021. ;  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Figure Captions
The copyright holder for this this version posted October 1, 2021. ; measurement points in order to be fitted by the model. The model is fitted to the processed data by searching over 3 parameters for each individual. Initially, parameter values are drawn from the prior distributions (blue histograms in bottom-left); the outcome of the ABC algorithm is the posterior distribution of parameters shown in orange in the bottom-right. In both cases the solid lines show a kernel density estimation (KDE) of the distribution, and the most likely parameters (maximum a posteriori) were taken as the peaks of the KDE curves for the posterior distributions. The accepted simulations are also used to predict the probability distribution of ventilation MRI indices I %/' and I )* (middle-right), as well as the ventilation distribution (expressed as a probability density function in the top-right, black line is the median, the dark orange region is the interquartile range, and pale orange the central 95%). The outcomes are then compared to the 3He MRI data (top-centre) where the blue histogram (top-right) shows the normalised probability distribution function of masked image intensity, which in this example closely matches the model prediction. These are added to the proximal end of each dead-space and the same volume is removed from the distal end (dashed line). This removed gas volume (blue) is then added to the volume of gas in the lung units. (b) On exhalation, the reverse process happens, and there is a rediscretisation of the volumes that are extruded from the private dead-spaces at the mixing point where they are combined. The tidal volume used was generated uniformly at random within the range 0.72 to 1.08L for each breath, and with flow-rate in the form of a step function with period 5s (2.5s inhalation and 2.5s exhalation). Measurements (with simulated measurement error as described in the section 2.4) were generated for 10ms intervals and these were then outputted to match the format of the raw MBW measurements from experiments. These were then processed and fitted with the same method outlined in section 2.3. Recovered (a) V # , (b) V . , and (c) σ * are plotted (MAP ± 95% PI) against input σ * , with the different line colours representing the different dead-space volumes. The dashed lines indicate the input parameter values. The y-axis range was chosen to show the full extent of the prior distributions.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 1, 2021. ; https://doi.org/10.1101/2021.10.01.21264402 doi: medRxiv preprint black line shows the median, the darker orange region the interquartile range, the light orange region the central 95%, and the red dotted line the mean of the MRI ventilation distributions predicted from the final set of accepted simulations from ABC-SMC for each histogram. In the inset of each graph is printed the Kolmogrov-Smirnov (K-S) statistics for both the median and mean predicted distributions (vs. the observed distribution). A bin-width of 0.16 was used to visualise these distributions. Each figure (a)-(y) shows the result for one of the 25 patients in this study. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint