Effects of Preanalytical Sample Collection and Handling on Comprehensive Metabolite Measurements in Human Urine Biospecimens

Epidemiology studies evaluate associations between the metabolome and disease risk. Urine is a common biospecimen used for such studies due to its wide availability and non-invasive collection. Evaluating the robustness of urinary metabolomic profiles under varying preanalytical conditions is thus of interest. Here we evaluate the impact of sample handling conditions on urine metabolome profiles relative to the gold standard condition (no preservative, no refrigeration storage, single freeze thaw). Conditions tested included the use of borate or chlorhexidine preservatives, various storage and freeze/thaw cycles. We demonstrate that sample handling conditions impact metabolite levels, with borate showing the largest impact with 125 of 1,048 altered metabolites (adjusted P < 0.05). When simulating a case-control study with expected inconsistencies in sample handling, we predicted the occurrence of false positive altered metabolites to be low (< 11). Predicted false positives increased substantially (³63) when cases were simulated to undergo alternate handling. Finally, we demonstrate that sample handling impacts on the urinary metabolome were markedly smaller than those in serum. While changes in urine metabolites incurred by sample handling are generally small, we recommend implementing consistent handling conditions and evaluating robustness of metabolite measurements for those showing significant associations with disease outcomes.

We note that sample handling conditions are specific to the type of biospecimen used in studies, whether it be plasma, serum, urine, cells, or tissues (6).Urine is a desirable medium for evaluating the metabolome as it is non-invasive, easy to collect, and provides a more global representation of metabolism.It is also worth noting that urine collection protocols tend to be more variable than those used for blood (7) and that issues in sample handling procedures account for the majority of preanalytical errors observed (8).With this in mind, we aimed to characterize sample handling effects on urine in a metabolomics analysis.We collected urine samples from 13 study participants, subjected the samples to various handling conditions commonly encountered in practice (borate, chlorhexidine, or no urine preservative, refrigeration time, and number/temperature of thaws) and examined how each condition affected observed circulating levels of over 1,000 metabolites.Similar to the serum study described above, we examined the impact of sample handling conditions on metabolite measurements using three key metrics, absolute percent, normalized difference, and metabolite abundance correlations.We further performed a simulated case-control study to estimate false positive rates of uncovering altered metabolites.

Study Population
The study enrolled 13 participants (6 men, 7 women) from the area surrounding the Beltsville Human Nutrition Research Center, US Department of Agriculture (Beltsville, Maryland) in 2016.The individuals were recruited from a database of interested volunteers maintained by the Center.
Mid-stream spot urine collections were obtained for each participant in the mornings into a sterile urine collection cup with a vacutainer port.Participants were not asked to fast.Aliquots were divided into three separate tubes corresponding to the following preservative treatments: 1) no preservative, 2) borate preservative (BD Vacutainer C&S Urine Tubes, borate 2.63 mg/mL, sodium formate 1.65 mg/mL), 3) chlorhexidine preservative (BD Vacutainer Urinalysis Plus, chlorhexidine 0.4%, sodium propionate 94%, ethyl paraben 5.6%).Tubes were centrifuged at 600xg for 5 min.One mL of urine, with no preservative, from each participant was pooled and mixed to be used as a QC sample, which was run repeatedly with other experimentally treated samples during metabolomics profiling (see section Laboratory assays).Ten aliquots of each of the 3 preservative samples (30 samples total per participant) were transferred into 2mL cryotubes to be subjected to various sample handling conditions, namely refrigeration and freethaw conditions, as outlined in Figure 1.Conditions tested included 3 preservative conditions (no preservative, chlorhexidine, and borate), 2 refrigeration conditions (24-hour refrigeration, no refrigeration/snap freezing), and 7 freeze-thaw conditions (no freeze-thaw, thaw on ice once or 4 times, thaw in refrigerator once or four times, and thaw at room temperature once or 4 times).
One sample of each preservative treatment was placed in the refrigerator for 24 h before freezing in liquid nitrogen and stored at -80 °C.Remaining samples were flash frozen in liquid nitrogen and stored at -80 °C.They were then subject to various thawing conditions and freezethaw cycles.Thawing was performed at either room temperature, in a refrigerator, or on wet ice.Freeze thaws consisted of either a single thaw or 4 consecutive freeze-thaw cycles.Samples undergoing freeze thaws were snap frozen in liquid nitrogen following their thaw.Samples undergoing freeze-thaw cycles had the thaw time standardized based on previous work (5).Thaws on ice were fixed at 50 min, room temperature thaws were fixed at 10 min, and refrigerator thaws were fixed at 16 h to simulate a 5 PM to 9 AM thaw often used in laboratories.

Laboratory assays
In total, 416 frozen samples were shipped to Metabolon, Inc. (Morrisville, NC) for extraction and metabolite profiling.These included 13 participants x 3 preservatives x 10 refrigeration/freeze-thaw conditions + 26 pooled QC samples.Samples were run in batches such that each participant's 30 samples were run sequentially on the same day, with a pooled QC sample run in positions 3 and 24 of each participant's sample batch.Pooled QC samples were created from 1 mL aliquots with no preservative from each participant.Urine osmolality was determined at pre-analysis thaw for each sample and reported values as mOsm/kg water.The Metabolon platform uses ultra-high-performance liquid chromatography coupled with mass spectrometry as previously published (9).Metabolon performed the peak picking, alignment, identification, and quality control as previously described (9).as high confidence identification without a chemical standard.We omitted 160 metabolites that had missing or below detection limit values in 80% or more of 390 study samples, leaving 1,048 metabolites for analysis.
All treatment comparisons were done pairwise within a participant's set of samples.Missing value imputation was thus performed by participant to eliminate biases introduced from using other individuals' data for imputation.The method for missing value imputation was dependent on metabolite data coverage over the individual's 30 samples.For metabolites having sparse data, likely to be missing due to low detection limit (missing in more than 30% of the samples), values were imputed with the half-minimum value for that metabolite in all of the participant's samples.For metabolites with more complete data coverage (missing in less than 30% of the samples), likely to be missing due to analytical or data preprocessing errors (e.g.co-eluting peaks and challenges in peak picking, etc.), metabolites were imputed using a singular value decomposition approach using the impute.svd()method from the R bcv package (v.1.0.1.4)(10).The median correlation between duplicate gold-standard condition samples was 0.94, similar to prior studies (11,12).The resulting data matrix was then centered and scaled for unsupervised clustering analysis (PCA) and natural log transformed for downstream statistical analysis.The preprocessing steps are shown in Supplementary Figure S1.

Statistical Analyses
Four previously described and distinct metrics were calculated to quantify the impact of sample handling conditions on metabolite abundances (5).The first metric, the absolute percent difference (APD), reflects the average difference in a metabolite's log abundance for a particular sample handing condition while keeping all other conditions constant ( e.g.all condition combinations of refrigerator storage and thaw conditions are considered when a particular preservative is evaluated).Second, the normalized difference (ND) estimates the mean difference, in log metabolite level, normalized by between-individual variability.Third, metabolite abundance correlations (Pearson) were calculated for each metabolite to compare a predefined condition (e.g., preservative) against others (e.g., no preservative) while keeping all other conditions constant.The fourth metric estimates false positive rates that result from a simulated case-control study where a given portion of the cases are simulated to have differing sample handling conditions.Supplemental File 1 provides details on the calculations that result in those metrics.

Comparison of Sample Handling Effects Across Different Biospecimen Types (Serum and Urine)
APDs from this study in urine were compared to serum stability data from McClain et al. (5) in which the same handling conditions were applied for refrigeration storage and freeze-thaw conditions.Eight of our study participants also participated in the serum analysis.A Wilcoxon Rank Sum test was conducted between the APD values for the 628 metabolites from serum vs. the APD values for the 1048 metabolites from this urine study.This test was run for each of the for use under a CC0 license.This article is a US Government work.It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted January 25, 2024.; https://doi.org/10.1101/2024.01.24.24301735 doi: medRxiv preprint 7 conditions in common between the two studies, including the 24h refrigeration storage and the six freeze-thaw conditions.The p-values were adjusted using a Bonferroni correction.

Data Quality Assessments
Unsupervised clustering of samples, based on 1,048 metabolites measured, confirmed that interindividual variability in metabolite levels dominates the observed variance, not sample handling effects (Supplementary Figure S2A).Moreover, participant samples were tightly clustered with no observed outliers, suggesting high quality data.
Missing value assessments demonstrated that sample dilution as indicated by osmolality was a strong determinant of the number of metabolites detected in samples (Supplementary Figure S2B.Participants of the study were not fasted, and urine sample osmolality ranged from 132 to 1265 mmol/kg (overall median 440 mmol/kg).The total number of detected metabolites decreased quickly for samples below 500 mmol/kg.Metabolites missing in >20% samples fell into specific class categories (Supplementary Table S1).Specifically, 20% or more metabolites falling under the classes of xenobiotics, cofactors and vitamins, peptides, and lipids were inconsistently represented across samples.Xenobiotics are the compound class with the highest percentage of inconsistently represented metabolites.

Global Impact of Preanalytical Sample Handling Conditions (APD and ND)
The median and inter-quantile ranges (IQRs) of the absolute percent difference (APD) and normalized difference (ND) for each metabolite provide a summary effect of a treatment across all metabolites.These metrics help discern the most labile metabolites under a given sample handling condition with respect to the gold standard condition (no preservative, no refrigeration storage, no freeze thaw).Median and IQR of APD and ND metrics for samples undergoing various sample handling conditions are shown in Table 2. Values are presented for the thaw conditions either with or without preservative treatment.Collectively, the effects of different handling conditions were rather small with median APDs £ 5.31 and NDs < 0.05.Also, the median APDs for thaw conditions were generally higher without preservatives.For example, the 4 freeze-thaws on ice median APD increased from 3.79 to 5.31 when computing APD on the subset of samples that lacked preservative treatment (adjusted P =1.15E-10, Cohen's d effect size =0.36).Median and IQR values for ND show a similar trend as observed in APD.It should be noted that gold standard treatment technical replicates, also shown in Table 2, have an APD of 3.57 and an ND of 0.039.Most treatments fall close to this APD and ND, suggesting that the treatments are close to or within technical variation when considering the median APD of a treatment.

Similarity of Metabolite Abundances Between Each Sample Handling Condition and the Gold Standard (Pearson Correlations)
Pearson correlations were calculated to evaluate the similarity between metabolite abundances in a specific sample handling condition compared to the gold standard.Median and IQRs of correlations between each sample handling condition and the gold standard across all metabolites are shown in Table 3.For all preservative, refrigeration, and freeze-thaw conditions, the median correlations were ≥0.98, demonstrating a high global concordance in the rankings of metabolite abundances when 100% of samples are switched from one handling condition to another.

Expected Number of False Positive Associations When Comparing Sample Handling Conditions to the Gold Standard (Case-Control Study)
We performed a case-control study simulation to predict the expected number of false positive associations due to sample handling conditions rather than sample group differences.This analysis helps clarify our ability to extract meaningful metabolite alterations sought after in epidemiological studies given variations in sample handling conditions which are oftentimes unavoidable.We find that sample handling conditions do not have a strong impact on the expected number of false positive associations predicted for realistic scenarios of different sample handling conditions (1, 5, and 10%) (Table 4).In fact, 0 false positive associations were predicted when 1% of the cases had an alternate sample handling condition.When 25% of the cases had alternate use of preservatives, 26 metabolites are expected to be false positives with chlorhexidine use and 45 with borate use.Alterations in refrigeration and thaw conditions were predicted to have < 5 false positive associations.When 100% of cases had alternate sample handling, the number of expected false positive associations were substantially increased (³63).Again, inconsistencies in use of preservatives, particularly borate, showed the was predicted to have the most impact on false positives (136 for chlorhexidine and 168 for borate).

Impact of Preanalytical Sample Handling Conditions at the Metabolite Level
While the global impact of sample handling conditions on metabolite intensities in urine is generally low, we found some metabolites are susceptible to changes with handling.The heatmap in Figure 2 generally shows that the use of preservatives (e.g., borate and chlorhexidine) and four freeze-thaw cycles on ice or refrigerator show the largest differences.Globally, the refrigeration and freeze-thaw cycle conditions resulted in very low numbers of metabolites with significant changes in PD using the established statistical cutoff used above.Further, 4 freeze-thaws tend to increase the number of altered amino acids and lipids for ice and refrigeration pre-treatment.This increase is not observed when evaluating thawing at room temperature or on ice.Compared to the gold-standard samples, borate treatment caused decreased abundance in 5 and increased abundance for 12 of the 28 carbohydrates quantified (based on one-sample t-test, Benjamini-Hochberg (BH) adjusted P <0.05).Glucose levels were on average 90% lower with borate treatment.Dehydroascorbic acid is similarly decreased by roughly 90% by borate treatment.Sucrose, lactose and glucuronate were also significantly decreased, consistently over all subjects.Other metabolites that increased with borate treatment included amino acids and derivatives such as glutamine (PD =0.12), glutamate (PD =0.7), cysteine s-sulfate (PD =0.6), 4-hydroxyphenylpyruvate (PD=0.56)and proline (PD =0.26).Chlorhexidine treatment altered PD for 74 metabolites (adjusted P <0.05).Nucleotides adenine (PD=-61%) and cytidine (PD=-28%) were decreased with chlorhexidine.Supplemental File 2 contains statistical results for all treatments, flagging metabolites having significant increased or decreased abundance.

Comparison of Pretreatment Effects Between Urine and Serum
When comparing metabolite APD values between urine and those from a previous serum study with identical 24h refrigeration treatment and six freeze-thaw treatments (3), the APD values were higher for serum than for urine for all shared treatments between the studies, except a single thaw performed at room temperature (Wilcoxon Rank Sum, Bonferroni adjusted P < 0.01) (Supplementary Table S2).We also note that globally, the number of predicted false positive metabolites in the simulated case-control study were higher in serum samples compared to urine (Supplementary Table S3).In the case of 5% of case samples having altered handling, the average number of false positives across the 7 common conditions across the two studies was 24 for serum samples and 0 for urine samples.When the percentage of case samples having altered handling increases to 25%, the average false positive count for serum was 187 compared to urine at 3 false positives.

Discussion
Results from this study highlight two key findings: 1) urine is strikingly less susceptible to sample handling effects than serum; and 2) preservatives have the largest impact compared to refrigeration and freeze-thaw conditions.Our first key finding is strongly supported by the fact that we followed a study design and analysis scheme used previously for studying serum metabolite stability (5), allowing direct comparison of sample handling effects on metabolite levels in both urine and serum.For the 24h refrigeration and 6 freeze-thaw conditions that were common between the studies, the statistical comparison of serum APD values to urine APD values showed that 6 of the 7 conditions had higher APD values for urine samples.Beyond this, the simulated case-control study had few false positive findings for urine compared to serum.Notably, this finding has practical implications given that many epidemiological studies have collected or plan to collect urine biospecimens.Furthermore, large initatives like the All of Us Research Program collect data and biospecimens, including urine, to study many different diseases and conditions.Importantly, while serum is collected more commonly than urine, urine does show important advantages.In addition to facile sample collection and, as we demonstrate here, decreased sample handling effects, metabolites measured in urine are impactful for biomarker detection and for improved detection of diet-related metabolites (13).
The second key finding that preservatives show largest differences in metabolite levels compared to other sample handling conditions is in line with previous observations that while preservatives do prevent bacterial growth, they do not avoid metabolite instability in and of themselves (14).Metabolites commonly evaluated for their effect on human health are impacted by these treatment conditions.For example, dehydroascorbic acid (oxidized form of vitamin C), glucose, glucuronate and glutamine are altered by borate.Similarly, nucleotides are impacted by chlorohexidine.Further, our results clearly show that consistency in sample handling is more important in reducing false positives than using ideal pretreatment conditions the majority (but not 100%) of the time.In fact, we found that ~4-15% of urinary metabolites had predicted false positive alterations when 25-100% cases had inconsistent handling conditions.Metabolites most prone to false positives are carbohydrates while those less prone are xenobiotics and amino acids.For more realistic scenarios (1,5, and 10% differeing sample handling conditions), we found that sample handling conditions did not have a strong impact on the expected number of false positive associations.We do recognize that in practice, consistency in sample handling when thousands of samples are being evaluated is extremely difficult to achieve.Nonetheless, this study highlights the importance of carefully understanding the preanalytical handling conditions to ensure that cases and controls have similar and consistent sample handling.
It is also important to note that all samples for each participant were run as a batch of consecutive injections on the mass spectrometer to minimize the introduction of batch differences between samples within a participant's sample set.All treatment effects were analyzed within an individual's sample set, in a pairwise manner to eliminate betweenindividual differences.This approach of reporting statistics within an individual's measurements ensured that sample treatment was the main variable being evaluated.
Other studies have evaluated pretreatment effects on the urine metabolome (8,6,15).Most studies, consistent with ours, report effects on the urinary metabolome due to refrigeration or freeze-thaw cycles (16,14) or preservatives (17).However, others report no effects on the urine metabolites due to freeze thaws.One such study was conducted in 6 females and evaluated 63 metabolites (16) and it is thus possible that results are specific to females and to the scope of the metabolome being measured.Another study evaluated the stability of adenosine in a case-control cohort of 40 diabetic patients and 40 healthy controls (18).While limited to a single metabolite, this study is a good example of the importance of evaluating the stability of potential biomarkers derived from epidemiological studies, since, as we observe as well, not all metabolites are affected similarly by pre-analytical conditions.Lastly, when evaluating effects of diet on the metabolome, one study reported that urine metabolites are more prone to variations than blood metabolites (19).While this observation contradicts our global finding that urine metabolites are more robust than blood metabolites in terms of preanalytical differences, we do report that xenobiotics show the highest percentage of for use under a CC0 license.This article is a US Government work.It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Despite our study approach and design, some limitations are worth noting.First, we recognize that urine samples can be collected through different procedures, from spot to 24-hour urine collection (20).Our study evaluated spot-collected samples, which is most common in epidemiology studies and a less resource-intensive collection method compared to 24-hour urine collections.Notably, our spot urine collections were performed at a similar time of day for each participant, reducing potential biases incurred due to circadian variation (21).Second, our sample size was relatively small and abundance levels between individuals could be affected by osmolality and could therefore impact our ability to observe preanalytical effects.Low osmolality samples tended to have higher APD than higher osmolality across all conditions (data not shown).While the lower salt (and other osmolyte) levels are not likely increasing the variance, the measurement error is likely to increase as concentrations go down, therefore causing an increase in variance.Third, this paper does not address analytical errors (sample preparations and run batches), although study design considerations and inclusion of appropriate QCs can mitigate those errors.In this study, we largely assumed that these analytical errors were consistent across samples.Lastly, all our samples are drawn from healthy volunteers.However, it is likely that metabolite instability due to preanalytical conditions could be different between diseased and healthy individuals due to sample matrix effects, namely due to different enzymatic activity and microbial contamination (6).
This study confirms the importance of implementing consistent handling conditions in epidemiological studies.While urine is suitable for evaluating the metabolome, and as shown here, is more robust to changes in preanalytical conditions than blood, some metabolites are altered due to preanalytical conditions.Robustness of metabolites that show significant associations with disease outcomes should thus be evaluated.This article is a US Government work.It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.Samples were then subjected to various experimental conditions including 24 hours of refrigeration storage, no freeze-thaw, thaw on ice, thaw in refrigerator, or thaw at room temperature 1 or 4 freeze-thaw cycles.In total, 390 samples (30 samples per participant) were sent for metabolomics profiling.
for use under a CC0 license.
This article is a US Government work.It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted January 25, 2024.; https://doi.org/10.1101/2024.01.24.24301735 doi: medRxiv preprint for use under a CC0 license.This article is a US Government work.It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Figure 1 .
Figure 1.Sample Collection and Preanalytical Sample Handling for Each Participant.Urine samples from each study participant were collected in sterile cups and subsequently divided into 3 preservative condition tubes containing no preservative, borate, or chlorhexidine.Samples were then subjected to various experimental conditions including 24 hours of refrigeration storage, no freeze-thaw, thaw on ice, thaw in refrigerator, or thaw at room temperature 1 or 4 freeze-thaw cycles.In total, 390 samples (30 samples per participant) were sent for metabolomics profiling.

Figure 2 :
Figure 2: Metabolites and their chemical class altered by preanalytical sample handling.Percentage of altered metabolites within each metabolite class.Metabolites are considered altered if they have a gain or loss of abundance by one-sample t-test (Benjamini-Hochberg adjusted p-value <0.05).Super Pathway annotations are those provided by Metabolon.Gold indicates increases in metabolite abundance while blue elements represent decreased abundance under a given condition.The color intensity indicates the percentage of metabolites having statistically significant abundance changes, with the most intense color indicating that more than 15% of the metabolites in that class have altered abundance, statistically significant percent change.

Table 1 .
Cohort description.This article is a US Government work.It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Table 2 .
Impact of preanalytical sample handling conditions on metabolite abundances.This article is a US Government work.It is not subject to copyright under 17 USC 105 and is also made available (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Table 3 :
Pearson correlation of relative metabolite abundances for a specific handling condition compared to the "gold standard" condition of no-preservative, no refrigeration storage, and no freezethaw cycles.

Table 4 .
Expected number of false positive associations found when comparing alternate sample handling conditions to the "gold standard" condition.