## Abstract

Methicillin-resistant *Staphylococcus aureus* (MRSA) colonizes multiple body sites, and carriage is an important risk factor for MRSA infection. Successful decolonization reduces disease incidence; however, the optimal decolonization protocol is unclear. Decolonization protocols include diverse sets of treatments that target multiple body sites, and standard methods can not estimate the impact of site-specific components from such data. Here, we formulated a Bayesian coupled hidden Markov Model (CHMM) for MRSA colonization dynamics that estimates interactions between body sites and quantifies the contribution of each component on successful decolonization. Importantly, this now makes it possible 1) to predict the efficiency of decolonization for any combination of site-specific treatments, including combinations not included in the data, and 2) to assess the impact of enhancing site-specific clearance on the success of decolonization. We applied the model to longitudinal data from a randomized controlled trial (RCT) of MRSA decolonization, including chlorhexidine (CHG) body and mouthwash and nasal mupirocin. Our findings confirmed that nares are central to MRSA persistence and act as a predominant source of colonization for other body sites. We estimated the CHG mouthwash as the least effective component, such that the combined effect of clearance on other sites accounted for as much as 97% (90% credible interval 90%-99%) of the full decolonization. Hence, the omission of CHG mouthwash still yields a highly efficient decolonization regimen while facilitating adherence. Finally, we used our model to estimate the effectiveness of hypothetical treatment improvements *in silico* and showed that enhancing MRSA clearance at nares will achieve the largest gains. This study demonstrates the use of advanced probabilistic modeling to go beyond what is typically achieved by RCTs, enabling evidence-based decision-making to streamline clinical protocols.

## 1 Introduction

Methicillin-resistant *Staphylococcus aureus* is a common antimicrobial-resistant pathogen in community and healthcare settings[1, 2], causing an estimated 320,000 infections in hospitalized patients and over 10,000 deaths in the United States in 2017[3]. Progress in reducing invasive MRSA infections has slowed, underscoring the importance of continued innovation and effort to prevent disease[4]. As MRSA carriage is a major risk factor for invasive disease, efforts at prevention center on the promotion of decolonization and body hygiene as well as environmental cleaning[5]. The most common *S. aureus* carriage site is the anterior nares, but it can also colonize the perineum and groin, the axilla, the pharynx, as well as other body sites[6, 7]. While the anterior nares have been identified as a key reservoir for transmission and a risk factor for invasive disease[8], the extent of interaction among colonization sites and the importance of addressing each site in decolonization protocols remain unclear. Decolonization protocols primarily target the nares and can include other sites as well, but it would be ideal to simplify, where possible, without significantly impacting effectiveness. Achieving this goal requires a detailed understanding of the dynamic relationship among colonization sites and the extent to which colonization at one site affects colonization at other sites.

The CLEAR (Changing Lives by Eradicating Antibiotic Resistance) Trial demonstrated that the use of a post-discharge decolonization protocol in MRSA carriers reduces infection and hospitalization rates[9]. In the trial, 2,121 study participants were randomized into two groups to test the impact of the decolonization protocol: the *education* group (n=1,063) received an educational binder on hygiene, cleanliness, and MRSA transmission; the *decolonization* group (n=1,058) received this information and as well underwent decolonization for five days twice monthly for six months, with the protocol consisting of nasal mupirocin and chlorhexidine body and mouth wash. During these six months, swabs were collected from the participants at discharge from the hospitalization and three follow-up visits, which approximately took place at months 1, 3, and 6 after the discharge. Samples were taken from the nares, skin (axilla/groin), throat, and, if present, any wound. Participants had different numbers of observations because of trial exits or skipped visits. Overall, 20,506 samples were received in the first 6 months of follow-up. Additionally, the dataset includes reported adherence to the protocol, enabling the assessment of both real-world uptake and consideration of the ideal scenario of full compliance.

To model the interactions of MRSA colonization between body sites, we used an extension of a hidden Markov Model[10] (HMM, a well-known machine learning algorithm[11, 12]), a coupled HMM (CHMM)[13, 14, 15, 16, 17, 18, 19], where the probability of colonization at a given body site depends on the colonization of all body sites in the previous time step (Figure 1). Here, we developed two novel formulations: *Additive-CHMM* and *Or-CHMM*. In the former formulation, the probability of colonization at a given site is an additive function of colonization at the other sites. In the latter formulation, other body sites are considered jointly, and the probability of colonization in the body site of interest increases if *any* of the other sites is colonized. According to a predictive model selection criterion (leave-one-out cross-validation, LOO-CV)[20], the Additive-CHMM had superior accuracy compared to Or-CHMM or collection of site-specific standard HMMs (Supplementary Materials), and consequently, results of Additive-CHMM are presented and discussed in the main text. We provided an easy-to-use R-package that implements these models with an efficient Metropolis-within-Gibbs Markov Chain Monte Carlo (MCMC) algorithm, which yields Bayesian credible intervals (CI) for all model parameters (Methods).

## 2 Results

### Nasal carriage of MRSA is the most persistent and most affected by the decolonization protocol

For each body site, we estimated the persistence of colonization by calculating the probability that the site would be colonized in the next time step when it was colonized in the previous time step, assuming other sites were not colonized. We found that in the education group, the estimated persistence was highest in the nares (Figure 2a). In the decolonization group, application of the protocol decreased persistence at each site, with the largest decrease in the nares[21] (Figure 2a). We also estimated the relapse probability, i.e., the probability of a site getting colonized when it was not colonized in the previous time step, assuming other sites were not colonized. However, this did not seem to be significantly affected by the decolonization protocol (Figure 2b). The number of samples from wounds was relatively small in the analysis, which caused the estimated parameters for the wound to have considerably wider CIs than estimates for the other sites. Repeating the analysis only on patients who reported adherence to the protocol led to almost identical results since most (70%) of the data were collected from adherent patients (Supplementary Materials).

### The decolonization protocol reduces the transmission of MRSA between body sites

We quantified the strength of dependencies of MRSA colonization between body sites using the increase in the log-odds of colonization in the next time step caused by colonized sites in the previous time step (Figure 3). First, as expected, the most important determinant of colonization at any site was whether the site itself was colonized in the previous time step. Second, the nares was the only site linked to all the other sites in the education group, highlighting it as the likely source from which MRSA spreads to the other sites. Third, the decolonization protocol decreased the impact of the nares on the throat and removed the impact of the nares on wounds (Figure 3). Except for a minor increase in the impact of the nares on the skin, other dependencies between body sites were not affected by the treatment. These findings indicate two mechanisms responsible for the efficiency of the decolonization protocol: 1) the protocol decreased persistence in the nares, which is a central hub for colonization (Figure 2a), and 2) the protocol weakened links between body sites, reducing self-inoculation (Figure 3).

### The CHMM accurately predicts the decrease in MRSA carriage over the study period

The key metric of success for decolonization is the extent to which it reduces MRSA carriage. To examine the CHMM’s predictive accuracy, we simulated patient trajectories using parameters from the estimated models and compared the decrease in colonization in our model-based simulations to the observed decrease from the CLEAR trial data. The predicted reduction of the carriage was from 66% to 48% (90% CI is 45%-51%) in the education group and down to 26% (23%-28%) in the decolonization group (Figure 4). These predictions accurately match the reductions observed in the trial. However, a discrepancy between predictions and observations was seen in intermediate visits. Especially in the decolonization group, the observed carriage rate decreases more rapidly than predicted in the beginning and then more slowly. This outcome may reflect heterogeneity in treatment response among the study population: some patients responded very fast to the treatment, others more slowly.

### Treatments on nares, skin, and wound account for 97% of the effect of full decolonization

The importance of the nares has previously been noted in data from an intensive care unit[22]. However, the added value of decolonization efforts that focus on body sites other than the nares is not well understood. To quantify the impact of decolonization efforts at each individual site, we predicted the efficiency of hypothetical simplified treatments, consisting of only a subset of the decolonization protocol (Figure 4). Nasal mupirocin decreased the estimated carriage from 66% to 31% (29%-34%), compared to 26% (23%-28%) of the full decolonization (Figure 4). Other single treatments were far less effective, reducing the carriage to 47% (44%-50%) with chlorhexidine mouthwash (CHG Oral) or 44% (41%-47%) with chlorhexidine bodywash (CHG Skin), a minor improvement compared to the 47% (44%-50%) of education alone. The best combination of two treatment components includes mupirocin and CHG Skin, i.e., leaving out CHG Oral, decreased carriage to 27% (25%-30%), corresponding 97% (90%-99%) of the decrease caused by the full decolonization protocol. In contrast, the combination of CHG Skin + CHG Oral and mupirocin + CHG Oral decreased the carriage only to 43%(40%-46%) and 30%(27%-33%), respectively (Suplementary Materials).

### Enhancing clearance at the nares is predicted to achieve the largest gain in overall decolonization success

We used our model to predict the impact of future improvements to components of the decolonization protocol. We assumed optimal treatment components, defined as treatments that resulted in immediate clearance of a specific site, could be applied on top of the existing protocol. The results were consistent with the previous finding of the nares’ central role in colonization and its clearance. In particular, the estimated final carriage with an optimized clearance of MRSA nasal carriage was 14% (13%-16%), while optimizing the throat or skin/wound treatments decreased the carriage to 21% (19%-23%) or 20% (18%-22%), respectively (Figure 4). Optimizing treatments both on nares + throat has the potential to reduce carriage further down to 10% (9%-11%) compared to 11% (10%-12%) of the nares + skin/wound, and 16% (14%-18%) of the throat + skin/wound (Suplementary Materials).

## 3 Discussion

We presented a model for the dynamics and clearance of MRSA colonization. Our model estimated interactions between multiple body sites, disentangled the impact of individual treatment components on the overall efficiency of decolonization protocol, and accurately predicted the decrease in carriage percentage over the study period. We found that not all treatment components contribute equally to successful decolonization. Among the CLEAR Trial components, mupirocin was estimated as the most effective treatment component, and our results indicated that omission of chlorhexidine mouthwash still yields a highly effective decolonization regimen. More broadly, this analysis indicates that the impact of individual interacting components of complex clinical protocols and RCT data can be rigorously deconvoluted and assessed *in silico* by machine learning tools like those used here, thereby enabling the design of interventions that are more efficient, easier to adhere to, and thus more likely to be successful.

As with all modeling, our analysis made several simplifying assumptions. First, we only considered the presence or absence of MRSA, ignoring the type and the burden of MRSA[23]. However, heterogeneity in treatment responses among patients and strain type of MRSA might affect the colonization duration. Second, the analysis focused on individual MRSA clearance, but the decolonization protocol may have other benefits, i.e., reduction in transmission between patients[24, 25]. Third, our model assumed that patients had their follow-up visits at the same time intervals, while in practice, there was some variation. Fourth, we assumed that whether an observation was missing did not depend on the colonization status of the body site, which is likely not always correct; for example, if a previously colonized wound is healed and not colonized anymore, no additional samples would be taken from the wound. Finally, our analysis is based on a Bayesian approach and required a specification of prior probabilities on model parameters to express what is believed about their values before seeing the data. We used relatively uninformative priors, and a comparison included in the Suplementary Materials shows that the model fit was not sensitive to the exact values of the priors. Despite these limitations, the model successfully estimated interactions between body sites, quantified the impact of individual treatment components, and predicted the decrease in MRSA carriage over the study period. Additionally, it allowed us to estimate the sensitivity and specificity of the detection of MRSA colonization, as those are immediately available as the emission parameters of the CHMM (Suplementary Materials). This analytical approach thus demonstrates how data from randomized controlled trials can inform on biological processes as well as guide improvements in clinical protocols and decision-making.

## 4 Methods

### Isolate Collection

MRSA isolates were collected as part of the CLEAR (Changing Lives by Eradicating Antibiotic Resistance) Trial. The trial was designed to compare the impact of a repeated decolonization protocol plus education on general hygiene and environmental cleaning to education alone on MRSA infection and hospitalization [3]. Study subjects in the trial were recruited from hospitalized patients based on an MRSA positive culture or surveillance swabs. After recruitment, swabs were obtained from different body parts of subjects (nares, skin, throat, and wound) around the time of hospital discharge (ENRL) and at 1,3,6 and 9 months (V1-V4, respectively) following the initial visit. The decolonization lasted only for 6 months, and consequently, we only modeled visits until V3 in this study. We note that some enrolled study subjects, despite a history of MRSA colonization or infection, did not have swabs positive for MRSA at the first time point (ENRL). More details about sample collection and data preprocessing can be found in the [3].

### Hidden Markov Model

Markov models are a well-known tool for time series analysis. Hidden Markov Models (HMM) are a special case where Markov process is observed indirectly through noisy observations. Discrete-time discrete-state Hidden Markov model with a latent sequence ** π**,

*π*

_{t}∈ {1, …, K} and observations

**x**,

*x*

_{t}∈ {1, …, L} for

*t*∈ {1, …,

*τ*}, is defined as where K is number of latent states,

*τ*is the number of time steps, and

*L*the number of possible observed states. Parameters of the model are the initial state probability

*π*

_{0}, transition probability

*T*, and emission probability

*E*, defined as

Note that the rows of *T* and *E* are probability distributions by definition. We will jointly denote the model parameters as *θ* = {*π*_{0}, *T, E*}. If *θ* is known, then we can estimate the latent states *π* conditional on parameters *θ* and observations **x** using the forward-filtering backward-sampling algorithm.

### Forward-Filtering Backward-Sampling

Filtering is the estimation of the current hidden state by using all observations so far, *p* (*π*_{t}| *x*_{1:t}) [12]. This is known as *forward-filtering* and it can be represented as follows:

We define *α* (*π*_{t}) = *p* (*π*_{t} | **x**_{1:t}) and substitute that into Equation 6, which yields the following recursive equation known as *α-recursion* or *forward recursion*;

The recursion starts with *α* (*π*_{1}) = *p* (*π*_{1}| *x*_{1}) = *p* (*x*_{1}| *π*_{1}) *p* (*π*_{1}) */p* (*x*_{1}). This shows that the filtered distribution *α*(.) is propagated through to the next time-step, where it acts like a **‘prior’**. In other words, at each time step the calculated posterior becomes the new prior for the next time-step [26].

To estimate the posterior distribution of latent states, we sample from the joint distribution of the latent sequence, *p*(*π*_{1:τ} | **x**_{1:τ}):

According to Equation 8, to be able to compute the posterior distribution of latent states, we need **‘time-reversed’** transitions *p*(*π*_{t} |*π*_{t+1}, *x*_{1:t}), which are obtained using *α-recursion*s from the forward filtering. The simulation starts from the end of the sequence and proceeds backwards recursively. Starting point is

For each time step *t*, the unnormalized sampling probability can be calculated using the previously sampled latent state , and the can be sampled using following:

This procedure is known as forward-filtering backward-sampling[26].

### Learning Model Parameters

Conditionally on the latent states and observations the parameters *θ* = {*π*_{0}, *T, E*} of the model can be estimated. The posterior distribution is given by

Factorizing the likelihood term in Equation 14 as in Equation 1, we can get the following result:

Since, *π*_{0}, rows of the *T*, and rows of the *E* are probability distribution, [.]’s inside the Equation 16 corresponds to Multinomial distribution with parameters ,and where corresponds to the initial state counts, *T*^{counts} corresponds to the state to state transition counts, and *E*^{counts} corresponds to the state to observation emission counts. Since the Dirichlet is conjugate prior for the Multinomial distribution, we set a Dirichlet prior for *θ* = {*π*_{0}, *T, E*} with hyperparameters , and , respectively. Because of the conjugacy, the resulting posterior in Equation 14 is also a Dirichlet distribution:

### Inference of HMM with MCMC Sampling

There are several ways to estimate the parameters of HMM when both hidden states and model parameters *θ* are unknown, for example the expectation-maximization algorithm, variational Bayes, or MCMC sampling. In this paper, we will use MCMC. The goal of MCMC is to produce draws from the posterior *p*(** π**,

*θ*|

**x**). The Gibbs sampler is an MCMC algorithm suitable for high dimensional problems, such as the HMM, and it samples parameters one-by-one using their distributions conditional on the other parameters. In the HMM, it will alternate between sampling the model parameters

*θ*conditional on the latent sequence

**from**

*π**p*(

*θ*|

**), and the latent sequence**

*π*, x**conditional on the model parameters**

*π**θ*from

*p*(

**|**

*π**θ*,

**x**). The algorithm iterates the following steps

*N*times, which gives

*N*draws from the posterior:

Sample the model parameters from Equation 17, Equation 18, and Equation 19, respectively. (Note that they are independent of each other given latent sequence.)

Sample the latent sequence of HMM using the updated model parameters

*θ*using the Forward-Filtering Backward-Sampling algorithm.

Before calculating the posteriors, the first *N*_{0} samples are discarded as a convergence (warm-up) period of the Markov chain. The plate diagram of the HMM is given in the Figure 5a, and the algorithm is described in Algorithm 1.

### Coupled Hidden Markov Model

In the ordinary HMM, transition parameters do not change. However, in the CHMM, transitions in one chain are affected by other chains. In principle, it would be possible to define a single joint HMM where the latent state represents the latent states of all individual chains jointly and adapt the solution for the HMM described above. However, this is inefficient where there are many chains because the number of states would grow as *O*(*K*^{C}), where *K* is the number of hidden states in a chain and *C* is the number of chains. On the high level, our key idea is that the transition matrix of each chain is modeled conditionally on the states of the other chains (and not as a single large joint transition matrix). A similar formulation is considered by [15]; however, the details are different. Note that a transition matrix *T* for a specific chain will not be time-independent anymore but changes at each time step depending on the states of the other chains. The graphical representation of the model (showing just two chains) is given in Figure 1, and the plate diagram of the CHMM is given in Figure 5b. There are *O*(*KC*) parameters in this formulation, which is much more efficient for the increased number of chains.

Defining the transition matrix is a critical part of the algorithm, and it is essential to carry as much information about the other chains as possible. We model the dependencies between the chains with parameters *β*. Assume there are *C* chains and let ℂ denote the set of chains. Further, assume that each chain can be in one of 𝕂 possible states, and let K denote the set of states. Finally, assume that there is a baseline state such that if a chain is in that state, it does not affect other chains (in our application, this state corresponds to the absence of MRSA colonization in the respective body site). The transition matrix for the chain *ĉ*, at time *t*, denoted as , is defined by;

The transition matrix is obtained in Equation 21 by applying a row-wise softmax operator *σ*_{row} to the unnormalized transition matrix . Parameter corresponds to an intercept matrix and it specifies the transition matrix of the target chain *ĉ* when all other chains are in the baseline state . Parameter represents the impact of chain *c* on the target chain *ĉ* and it is added to whenever chain *c* was in state in the previous time step. We will denote all the parameters for target chain *ĉ* as *β*^{[ĉ]}.

In Equation 20, the unnormalized transition probability *U* is an additive function of the latent states of the other chains; hence we call it the *Additive-CHMM*. We also implemented a simple formulation than the Additive-CHMM, which we call *Or-CHMM*. In the Or-CHMM, the target chain is not affected by the other chains individually; instead, there the target chain is affected whenever *any* of the other chains is in some state different from the baseline state, and the unnormalized transition matrix is given by:

In our application we have *K* = 2, corresponding to the presence and absence of colonization in a given body site. In this case, Equation 22 has a simple form: If all other chains are in state , the output is only , and if any of the other chains is in a state other , then the output is the sum of and .

### Design and Interpretation of the *β* parameters

In the CHMM, the transition matrix *T* for each chain is modeled as a function of *β* parameters, where each *β*_{k} is a matrix of the same size as the transition matrix. Therefore, the *β* parameters are a list of matrices. Since the rows of a transition matrix *T* are assumed independent, we also model *β* parameters such that the rows of those matrices are independent. However, to avoid redundant parameters, the rows on each row of *β* are assumed to sum to zero. In practice, we sample the first *K −*1 parameter on a given row and set the *K*th element to equal the negative of the sum of all the other parameters. This constraint ensures that the mapping from *β* to *T* is one-to-one.

To give an interpretation to the parameters *β*, we start with Equation 21 and write the softmax for a single element of a transition matrix *T* (*i, j*):
from which it follows after straightforward algebra:

In our application *K* = 2 and sums of the rows of *β*_{k} are assumed equal to 0, so consequently also the rows in the *U*_{t} sum to zero, which gives us:
where *¬j* refers to the other element on the row that is not *j*. Therefore, in this 2-dimensional case, all *β*_{0} values correspond to half of the log-odds of the respective transition probability, and similarly, parameters *β*_{k} representing the interactions between the chains correspond to half of the change in the log-odds because of the presence of colonization in the other chain.

### Adaptive Metropolis-Hastings Within Gibbs Sampling

So far, we have described the design specifications of the CHMM. Since transitions are changing at each time step, it is not feasible to calculate their sufficient statistics as in Equation 16. To resolve this, we sample the *β*^{[ĉ]*} using a Metropolis-Hastings(MH) step within the Gibbs sampler because conditional on the latent states of all the chains, the transition parameters are fully determined by the *β*^{[ĉ]}. Parameters. To get a draw from , where states of the other chains, we use the following steps;

Make a proposal

*β*^{[ĉ]*}~*q β*^{[ĉ]}Transform

*β*^{[ĉ]*}samples to transition probabilities according to latent states of the other chains,*π′*Accept the proposal

*β*^{[ĉ]*}with probability

where

The first quantity on the right-hand side of the Equation 28 can be directly calculated with the transition parameters for corresponding chain. We used the Gaussian proposal in which the previously sampled *β*^{[ĉ]} is the mean such that *q* (*β*^{[ĉ]*} | *β*^{[ĉ]})= *N*(*β*^{[ĉ]*} | *β β*^{[ĉ]}, *c*^{2} Σ). As a variance parameter, we used a fixed diagonal covariance matrix during the warm-up period and then used the covariance matrix estimated from the previous samples. We initially set as metropolis jumping scaling factor since it is theoretically the most efficient scaling factor[27], where *d* is the dimension of the sampling. Then we adaptively scaled the scaling factor *c* as described in the [28]. Since this proposal distribution is symmetric (i.e., Normal), *q*(*β*^{[ĉ]} |44*β*^{[ĉ]*} *)* and *q*(*β*^{[ĉ]*} |*β*^{[ĉ]})in Equation 27, cancel out each other.

To sample the latent sequences we need to modify the Forward-Filtering Backward-Sampling algorithm[18], since we want samples from instead of . Because in our application the within chain dependencies are stronger than between chain dependencies, we will use the Forward-Filtering Backward-Sampling with a modified model where the outgoing edges from the current chain are neglected as the proposal distribution *q*, and adjust for the discrepancy between the proposal and the true conditional distribution by using the following MH acceptance probability[18]:
where *p*(.) distribution on the right-hand side of the Equation 29 is the joint posterior probability of the latent sequences, which is straightforward to calculate conditionally on the current CHMM parameters, and *q*(.) is the probability of samples for particular chain that can be calculated during the backward sampling process (for details, see [18]). Algorithm 2 summarizes the whole algorithm. To validate the implementation, we used the algorithm to estimate parameters in simulated data. The results in Supplementary Figure 1 show that the algorithm estimated the parameters correctly and yielded well-calibrated posterior distributions.

### Implementation details

We set the initial covariance matrix for the metropolis proposal as 0.01 *×I*, where *I* is the identity matrix, which corresponds to a step size giving the optimal acceptance rate of *≈*23% [29]. We set the prior of *β*_{0} as *N* (*β*_{0} | 0, 5), which is almost uninformative so that the estimates are not affected strongly by the prior. For the rest of the *β* parameters, denoted by *β*_{k}, we used sparsity encouraging Horseshoe prior with mean and scale parameters are 0 and 1, respectively (see Suplementary Materials for details). We used a uniform prior on the initial state probabilities *π*_{0}, and very weak Dirichlet priors for the rows of emission probabilities *E* such that we set the value to 0.9 if the latent state emits the same observation and set the value to 0.1*/*(*n −*1), *n* is the number of states, if the latent state emits a different observation. In such a formulation, except for the initialization, the prior has a negligible effect because it is summed with observation counts during inference. We drew 15,000 MCMC samples, and we set the warm-up length as 7,500. Posterior probabilities are calculated using the remaining MCMC samples. HMM and CHMM implicitly assume that time intervals between observations are the same, which is not the case in our data. Therefore, during the model training, we assumed that there are missing observations at 2, 4, and 5 months after enrollment.

## Data Availability

The CLEAR (Changing Lives by Eradicating Antibiotic Resistance) Trial demonstrated that the use of a post-discharge decolonization protocol in MRSA carriers reduces infection and hospitalization rates; ClinicalTrial.gov number NCT01209234, see reference https://doi.org/10.1056/NEJMoa1716771 for informed consent and institutional review board approvals. The data used in the present article is available on request from the corresponding author Susan Huang of the https://doi.org/10.1056/NEJMoa1716771, a co-author of this study.

## Data Availability

The CLEAR (Changing Lives by Eradicating Antibiotic Resistance) Trial demonstrated that the use of a post-discharge decolonization protocol in MRSA carriers reduces infection and hospitalization rates; ClinicalTrial.gov number NCT01209234, see reference [9] for informed consent and institutional review board approvals. The data set used in the present article is available on request from the corresponding author Susan Huang of the [9], a co-author of this study.

## Code Availability

The R package is available at https://github.com/onurpoyraz/chmmMCMC.

## Author Contributions

YHG and PM perceived the study. OP, PM, and YHG designed the model. OP implemented the model and run the analyses. OP, SSH, YHG, and PM interpreted the results. MRAS, LGM, JAM, and SSH prepared the data. OP, PM, YHG wrote the manuscript with comments from all authors.

## Supplementary Materials

### Model Comparison

We compared three models: the site-specific Standard-HMMs as a baseline (i.e., modeling each site independently with a different HMM), the Or-CHMM, and the Additive-CHMM model using six different initial parameterizations. First, we considered prior *N* (*β*_{k} | 0, 5), as in *β*_{0}, and prior *N* (*β*_{k} | 0, 1), to see how sensitive the performance is to the strength of the prior. In addition to these Gaussian priors, we also considered the Laplace and Horseshoe distributions, which are sparse priors that encourage values close to zero. Note that *β*_{0} always had the same uninformative Gaussian prior because the *β*_{0} values represent the impact of the history of a chain on its future values, which is expected to be non-sparse. On the other hand, *β*_{k} values correspond to the effects of the other chains, which may be zero in practice.

We compared the models with three different information criteria (IC): 1) The Akaike Information Criteria (AIC), which is the summation of the negative log-likelihood of the model and the number of parameters, 2) the Watanabe-Akaike Information Criteria (WAIC), also known as the Widely applicable information criteria, which is the generalized version of the AIC onto singular statistical models [30] and 3) the leave-one-out cross-validation (LOO-CV), which measures model performance by estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values[20]. WAIC and LOO-CV are more computationally intensive than AIC. Still, they are theoretically preferred over AIC for Bayesian models[20]. For all criteria, a lower score is better. Supplementary Table 1 presents all the scores used to evaluate model performance.

## Footnotes

onur.poyraz{at}aalto.fi, msater{at}hsph.harvard.edu, lmiller{at}lundquist.org,jmckinnell{at}lundquist.org, sshuang{at}hs.uci.edu, ygrad{at}hsph.harvard.edu, pekka.marttinen{at}aalto.fi

Title is changed The methods section is updated Results are revised Model convergence is shown