Abstract
A tacrolimus population pharmacokinetic (PK) study with 440 subjects was performed using real world data extracted from electronic heath records (EHRs) to identify patient factors affecting variability in tacrolimus PK parameters. Using last-dose times extracted by our own natural language processing system, medExtractR, we assessed the effects of incorporating last-dose times in the data on tacrolimus PK parameter estimates. The estimates of population level PK parameters were Cl = 24.9 L/h [23.5, 26.3] and V = 6310 L [5461, 7159]. There was no appreciable difference in parameters estimates with vs. without last-dose time incorporated in the data. We also investigated the effects of absorption rate constants that are often fixed at a published value in tacrolimus population PK analysis. Our sensitivity analysis revealed little difference between parameters estimated assuming a range of absorption rate constants.
Introduction
Tacrolimus is an immunosuppressive calcineurin inhibitor widely used to prevent rejection following organ transplantation. Tacrolimus has a narrow therapeutic window, yielding less efficacy (i.e., an increased probability of implant rejection)1 or toxicity (e.g., nephrotoxicity, neurotoxicity, diabetogenicity)2,3 if it falls outside of this window. The therapeutic window could be considered differently depending on timeline since transplantation (e.g., 10-20ng/ml in the first month, 5-15ng/ml in the second and third months, 5-10ng/ml in subsequent months post-transplantation).4 The challenges of maintaining this therapeutic range is exacerbated by individual variability in tacrolimus disposition. Several patient characteristics affecting clearance have been reported: examples include liver function,5 time since transplantation,6 and age.7 Especially, the effect of single nucleotide polymorphisms (SNPs), rs776746, on CYP3A5 gene coding the major metabolizing enzyme for tacrolimus, has been reported from multiple studies including our own.8 As patients carrying the CYP3A5∗3 SNP, a loss-of-function allele, reduce clearance, these patients require significantly smaller tacrolimus doses to maintain the same tacrolimus concentration level.9-13 This SNP alone was reported to be associated with 39% of the variability in dose requirement between patients.8
Tacrolimus population pharmacokinetic (PK) studies have traditionally relied on prospective observational studies performed during routine care. While such studies are not as costly or restrictive as randomized controlled trials, they still impose costs associated with data collection and restrictions on patient enrollment.14 Postmarketing drug studies have been increasingly performed using real-world data sources such as electronic health records (EHRs) which are particularly valuable as a source of longitudinal clinically relevant data. The adoption of EHRs has made large-scale retrospective studies feasible with rapid cohort generation from patients enrolled in routine care, and there has been increasing interest in using EHRs as real word data.
Due to the narrow therapeutic window with high variability in tacrolimus disposition, it is one of the drugs that require therapeutic drug monitoring (TDM).15 Tacrolimus blood concentrations are routinely collected as part of standard of care, making retrospective studies with EHR data more appealing. Several tacrolimus PK studies have been conducted using such data.16-18 As tacrolimus population PK studies are performed retrospectively using trough drug concentrations, they often assume the absorption rate constant, ka, at a fixed value in the PK models. Although some studies may perform a sensitivity analysis, the effect of this assumption has not been thoroughly investigated.
We recently developed a natural language processing system, medExtractR,19 to extract medication information from free-text clinical notes as part of a system to enable the use of EHRs in retrospective studies of drugs.20,21 The system, once finalized, should relieve the primary burden in data generation which often involves the manual extraction of medication data. In addition to drug dosing information, medExtractR is designed to extract explicit last dosing times if present in the notes. Tacrolimus population PK studies typically use trough concentrations that are assumed to be measured before taking a morning dose and assumed to be preceded by a previous dose at a regularly scheduled time (i.e., twice a day). Any deviation of this assumed schedule from the true dosing time may result in a different estimate of the PK parameters. To the best of our knowledge, no studies have investigated the sensitivity of tacrolimus PK parameter estimates to deviations in last-dosing time.
The goals of this study were threefold: (1) to perform tacrolimus population PK analysis using real world data generated with EHRs and identify the important factors affecting PK profile; (2) to investigate the effects of last dosing time on the tacrolimus PK parameter estimates using the data extracted from the EHRs; and (3) to investigate the effects of assuming the absorption rate constant, ka, at a fixed value in a model on the other PK parameter estimates.
Methods
Study Design and Data Source
This study was approved by the Vanderbilt Institutional Review Board. We used data obtained from the previous study, which was described in detail in Birdwell et al.8 The data included in the tacrolimus PK modeling were medication dose, drug concentration levels, demographics, laboratory, and genotype data. Additionally, we extracted the last dosing time from the same clinical notes that yielded the original dataset.
Times of last dose were extracted using medExtractR (see Weeks et al. 2020 for details).19 Extracted last dose times appeared in various formats, for example using am or pm (e.g., “10 am”, “9:30 pm”), military time (e.g., 2200, 20:30), or using a modifying word or phrase (e.g., “8 last night”, “yesterday morning at 7”). Figure 1 outlines the algorithm to process these extractions. All time expressions were initially converted into the same format of HH:MM:SS. For example, the phrase “8:30pm” would become “20:30:00”. Concentration measurements were generally assumed to be trough levels taken at a morning appointment. For this reason, PM last dose times were assumed to have occurred on the previous day and AM times were assumed to have occurred on the same day as the laboratory value. Conflicts could occur if different last dose times were extracted within the same note or on the same date. Assuming concentration measurements were intended to be trough values, time differences within 2 hours were considered close enough to be equivalent with respect to impact on the PK parameter estimates and the earlier extracted time was kept. Cases where discrepancies still existed after removing differences within 2 hours were manually reviewed with a clinical expert to determine which last dose time was correct. Ambiguous cases where the correct last dose time could not be determined were treated as missing.
Extraction and processing examples for time at which a patient took their last medication dose. In step 2, PM times are assumed to occur the day prior to the note date. AM times are assumed to occur on the same day at the note date. Final dataset includes one last dose time for each ID-date pair.
Data Process
We used the original data (which do not have NLP-extracted last-dose times) and the last-dose times we extracted from the clinical notes for the same cohort to build four datasets: two datasets for the entire cohort with and without last-dose time, and two for a reduced cohort with and without last-dose time. The reduced cohort was defined based on the number of extracted last-dose times for each subject. Subjects having at least 4 extracted last-dose times were included in the reduced cohort. Otherwise, the PK datasets for both cohorts were built with the same algorithm implemented as a function in the EHR R package. A brief description of the PK data building method with and without last-dose time is as follows.
When last-dose time is not available, a regular dosing interval is assumed. Specifically, we assume that a dose of tacrolimus is taken 30 minutes (a reasonable timeframe confirmed by a clinician) after blood is drawn for a concentration observation. We further assume that the drug is taken every 12 hours following that initial dose. As most drug level measurements are being taken in the morning to provide trough concentration level, subjects are assumed to take their morning dose 30 minutes after having the blood drawn and continue to take their doses every 12 hours from that point until their next concentration measurement, the timing of which will dictate their next proceeding dose timings. This fully describes the generation of the dataset without last-dose times.
When last-dose time is available, the dataset with last-dose times can be generated. For example, if a last-dose time is found matched with a concentration measurement, the preceding dose is set to be given at the time reflected in the clinical note. This dataset better represents the dosing times, among which the last-dose time is the most informative to estimating PK parameters. Note that not all extracted concentrations need to be associated with a last-dose time. For concentrations missing last-dose times, the record will be the same as the record without last-dose times.
Population PK Analysis
We performed population PK analysis of tacrolimus using a nonlinear mixed-effects model implemented by Monolix® (version 2020) with the stochastic approximation of expectation and maximization (SAEM) method. A one-compartment PK model was chosen as the base model, assuming a combined additive and proportional residual error model and lognormal distribution for the random effects PK parameters. A model with random effects with unstructured covariance for all main PK parameters except for the absorption rate constant, ka, was assumed in the final model. As ka cannot be reliably estimated without drug concentrations measured during the absorption phase, it was fixed at the previously published value of 4.5.22
Covariate model building was performed using individual specific PK parameters estimated from the base model. Both graphical and statistical methods were considered with the following candidate covariates, which we chose a priori based on previous research and biological plausibility: weight, age, sex, hemoglobin, albumin, race, and a CYP3A SNP (rs776746). Model selection was performed based on the objective function (−2 log likelihood) along with the number of parameters, which would approximately follow χ2 distribution. The χ2 statistics of 3.84 and 6.64 with 1 degree of freedom correspond to p values of 0.05 and 0.01, respectively. Thus, we considered the objective function value decrease of 6.64 to be significant model improvement. Variables were added to the base model one-by-one and evaluated for a decrease in the objective function value of > 6.64. Variable selection was performed only with the dataset of the entire cohort with last-dose times; selected variables were then used to build the same model from the remaining three datasets. The model was qualitatively assessed through visual examination using goodness-of-fit plots such as the observed vs. predicted concentrations, the individual weighted residuals, and the visual predictive check using Monolix® (version 2020).
Sensitivity Analysis
The ka in our final models was assumed to be 4.5 as reported previously reported.22 In order to assess whether our findings are sensitive to this selected value, we refit the model at another published ka value of 3.09.12 In addition, we refit the model with the published ka value for the extended-release formulation, 0.375.23
Results
Population Characteristics for the Two Cohorts
The study cohort was described previously by Birdwell et al.8 The previously described cohort of 446 subjects was reduced to 440 after removing participants with only a single concentration observation (N=4) and participants with no known SNP (N=2). The characteristics of study population for the two cohorts are described in Table 1. Each participant in the cohort has a maximum of 10 tacrolimus blood concentration measurements. These observations constitute either the first 10 concentrations or every concentration if less than 10 measurements for each patient beyond the 1-month post-transplant period. Of these concentration measurements, 48% were accompanied by an extracted last-dose time. Median tacrolimus dose across all subjects was 3 mg twice daily, and median blood concentration measurements was 7.0 mcg/L. The reduced cohort consists of 257 subjects. The percentage of concentration measurements associated with a last-dose time is increased to 70% as designed, while median tacrolimus dose and median blood concentration remain almost the same.
Population characteristics of the entire dataset (left) and the reduced dataset (right). Values are presented as count (proportion) for categorical variables and mean (S.E.) median[interquartile range] for continuous variable.
Population PK Model
The results of the primary analysis, including the base model and the final covariate model based on datasets from the entire cohort and the reduced cohort, are presented in Table 2. PK parameters such as clearance (CL, L/hr) and volume of distribution (V, L) were first estimated from the base model without covariates. Note that we denote CL/F by CL for simplicity, where F represents relative bioavailability and is omitted elsewhere. The PK parameters varied substantially among individual participants; the coefficients of variation (CV%) for CL and V were 42% and 192%, respectively.
PK parameter estimates for both the entire and reduced datasets either with or without last-dose time. Estimates are presented as mean (S.E.) [Wald 95% confidence interval]. CYP3A5∗3/∗1i and CYP3A5∗1/∗1i are indicators for genotype for ∗3/∗1 and ∗1/∗1, respectively, and wti represent weight for subject i.
CYP3A SNP, weight, and age all substantially improved the model fit. Their inclusion reduced the objective function value by 161, 7.28, and 18.3, respectively. Other a priori defined covariates (serum albumin, hemoglobin, and race) explained little of the inter-individual variability for PK parameters and hence minimally improved model fit (i.e., decreased the objective function by < 6.64) and were excluded from the final model. The final model is presented as follows:
and
where CLi and Vi are the individual-specific PK parameters for subject i, corresponding to CL and V, wgti is participant weight in kilograms (kg), agei is participant age at study entry in years, CYP3A5∗3/∗1i and CYP3A5∗1/∗1i are indicators for genotype for ∗3/∗1 and ∗1/∗1, respectively, where the reference is genotype for ∗3/∗3. ηiCL and
are random effects explaining inter-individual variability for CL and V, which follow a normal distribution with mean zero and variance of ω2CL and ω2V. The θs in the equations denote model parameters as typically used in statistical models. Age and weight are standardized by dividing by their population median.
In the final covariate model for the entire cohort dataset with last-dose times, the estimates of typical values of CL and V were 24.9 L/hr and 6310 L for a 70 kg, 46-year-old subject with CYP3A5∗3/∗3. Subjects with ∗1/∗3 and ∗1/∗1 increased clearance 1.77 (i.e., e0.57=1.77) and 2.25 times (i.e., e0.81=2.25) compared to clearance for those with ∗3∗3, respectively. The CV for CL and V were 18% and 369%, respectively, reduced from the base model.
PK parameters estimated in the without last-dose time dataset do not fall outside of the Wald confidence intervals constructed about the relevant parameters estimated from the dataset with last-dose times. This is true in both the entire cohort and the reduced cohort selected for greater prevalence of last-dose times.
Model diagnostics
Figure 2 shows model diagnostics in the model built from the entire cohort dataset with last-dose times. Overall, the goodness-of-fit plots present reasonable model fit although some deviation from normality was noticed in the observed vs. predicted concentrations plots. The weighted residuals also demonstrate slight deviation from normality, but 95% of the residuals fall within the range of 2 to −2. The visual predictive check shows that predicted median and lower quantile sometimes overestimate true values across follow-up time.
Model diagnostic plots for the model fit to the entire cohort dataset with last-dose times. Observed vs. predicted concentration plots are shown on the log scale for both population (left) and individual (right) level predictions. Individual weighted residuals (b) are plotted against time (left) and predicted concentration (right). The corrected visual predictive check (c) is shown up to the follow-up time encompassing 99% of observed concentrations.
Sensitivity Analysis of ka in Population PK Model
Table 3 shows parameter estimates from the entire cohort dataset with last-dose times when ka is either assumed to be 4.5, 3.09, or 0.375. The estimates (± S.E.) of typical values for Cl were 24.9 ± 0.69, 24.1 ± 0.70, and 25.5 ± 0.70, and those for V were 6310 ± 433, 6130 ± 722, and 6040 ± 719, for ka being 4.5, 3.09, and 0.375 models, respectively. None of the estimates for ka = 3.09 and ka = 0.375 models fall outside of the 95% Wald confidence intervals for the corresponding parameters of the ka = 4.5 model.
PK parameter estimates for models assuming ka to be 4.5, 3.09, 0.375 (left to right). Estimates are presented as mean (S.E.) [Wald 95% confidence interval]. CYP3A5∗3/∗1i and CYP3A5∗1/∗1i are indicators for genotype for ∗3/∗1 and ∗1/∗1, respectively, and wti represent weight for subject i.
Discussion
Our tacrolimus population PK study performed with real world data using solely EHR data reproduced the well-established relationship between CYP3A SNP and some of the PK characteristics of tacrolimus. Our base model estimate of CL of 30.8 L/h is large compared to published estimates (with 95% CI) of 22.1 [19.3, 24.0]11 and 22.7 [21.0, 24.4].12 Our base model estimate of V of 6550 L is also large compared to base published estimates of 653 [418, 888]11 and 1090 [911.8, 1268.2].12 However, the significant association of CYP3A5 SNP published previously is replicated. Hematocrit did not affect model performance enough to be included in our model yet was found to have a significant effect by Zuo et al.12 Future work will investigate the sensitivity of our estimates to model specification to attempt to account for these discrepancies.
The difference in last-dosing time did not meaningfully alter the estimation of PK parameters or covariate effects in the model in either the entire cohort dataset or the reduced dataset which oversampled subjects for which more last-dosing time information was available. The lack of a difference can be attributed to the following reasons. First, the estimated timing of the last dose may be close enough to the actual time, which would be a reasonable assumption for many patients as the compliance of organ transplant patient population is known to be high because of the serious consequence of organ rejection if they are not compliant to taking medication at scheduled intervals, or missing doses. Second, tacrolimus concentration levels are likely in the steady-state after 1-month, from which our data collection was started. In addition, as tacrolimus has a long half-life, trough concentrations would well approximate the average steady-state concentration,24 which would allow for the PK modeling to be less sensitive to deviation of true dosing time. Our findings support the general modeling approach used in several tacrolimus population PK studies,11,12 where ka was fixed at a published value. Our study results will be useful for PK studies performing using observational data such as EHRs for many medications having similar PK characteristics, such as a long half-life.20
We are interested in whether our estimates for clearance and covariate effects are sensitive to our selection for ka. The choice of ka in a range of 12-fold difference (from 0.375 to 4.5) had little impact on parameter or covariate estimates in the entire dataset; all parameters in the ka = 3.09 or 0.375 models were within the 95% confidence interval of the corresponding parameters in the ka = 4.5 model. This is likely due to the use of trough concentrations, which are taken well after the absorption phase and are therefore less impacted by the absorption process. Given the twice-daily dosing schedule and clinical goals of tacrolimus blood concentration maintenance via TDM, any reasonable selection for ka is not likely to impact PK modeling. These ka values were reported for immediate-release formulations for tacrolimus, which is the formulation used in our study. As the extended-release formulation for tacrolimus has been on the market for a while and will likely be increasingly used in the future, our findings would be also useful for tacrolimus population PK studies performed with the extended-release formulations.
Future work will be to replicate these results using our complete medication information pipeline and compare it to the results obtained through clinically validated data. Reproduction of PK parameters and covariate effects will indicate that our system is viable to replace costly manual information extraction to build PK data.
Data Availability
Data is not publicly available.
Funding
LC is supported by NIH/NIGMS (R01-GM124109).
Conflict of Interest / Disclosure
The authors have no conflicts of interest to disclose.
Author Contributions
All authors participated in critical review and revision of the final manuscript and approved the final manuscript draft.
MLW: Wrote Manuscript, Performed Research, and Analyzed Data.
HLW: Performed Research and Contributed Analytical Tools.
CB: Contributed Analytical Tools.
LC: Wrote Manuscript, Designed Research, Performed Research, and Analyzed Data.