Abstract
Introduction Lung cancer leads in cancer-related deaths. Disparities are observed in lung cancer rates, with African Americans (AAs) experiencing disproportionately higher incidence and mortality compared to other ethnic groups. Non-coding RNAs (ncRNAs) play crucial roles in lung tumorigenesis. Our objective was to identify ncRNA biomarkers associated with the racial disparity in lung cancer.
Methods Using droplet digital PCR, we examined 93 lung-cancer-associated ncRNAs in the plasma and sputum samples from AA and White American (WA) participants, which included 118 patients and 92 cancer-free smokers. Subsequently, we validated our results with a separate cohort comprising 56 cases and 72 controls.
Results In the AA population, plasma showed differential expression of ten ncRNAs, while sputum revealed four ncRNAs when comparing lung cancer patients to the control group. In the WA population, the plasma displayed eleven ncRNAs, and the sputum had five ncRNAs showing differential expression between the lung cancer patients and the control group. For AAs, we identified a three-ncRNA panel (plasma miRs-147b, 324-3p, 422a) diagnosing lung cancer in AAs with 86% sensitivity and 89% specificity. For WAs, a four-ncRNA panel was developed, comprising sputum miR-34a-5p and plasma miRs-103-3p, 126-3p, 205-5p, achieving 88% sensitivity and 87% specificity. These panels remained effective across different stages and histological types of lung tumors and were validated in the independent cohort.
Conclusions The ethnicity-related ncRNA signatures have promise as biomarkers to address the racial disparity in lung cancer.
Introduction
Lung cancer is the leading cause of cancer-related deaths in both men and women in the USA 1. Non-small cell lung cancer (NSCLC) accounts for 85% of all lung cancer cases and is mainly composed of two histological types: adenocarcinoma (AC) and squamous cell carcinoma (SCC) 1. Early detection and timely treatment can significantly reduce morbidity and mortality in NSCLC 1. However, existing diagnostic methods fall short in the early detection of NSCLC. Furthermore, there are notable disparities in NSCLC between different ethnicities, with African Americans (AAs) experiencing a higher prevalence and mortality rate from the disease 2. The annual incidence of lung cancer is highest among AAs at 76.1 per 100,000, followed by White Americans (WAs) at 69.7 per 100,000, American Indians/Alaska Natives at 48.4 per 100,000, and Asian/Pacific Islanders at 38.4 per 100,000 2. In addition to socioeconomic differences, biological factors like tumor biology, genetics, and molecular alterations also contribute to the disparities in lung cancer 3. For instance, genome-wide association studies have identified lung cancer susceptibility loci on chromosomes 5p15 and 15q25 in an AA population4.
Differences in the methylation levels of genes with functional relevance, like the nuclear receptor subfamily 3, have been identified as potential contributors to racial disparities in NSCLC 5. The epidermal growth factor receptor (EGFR) mutation is more common in AA lung cancer patients compared to other populations6. mRNA transcripts from AAs are less likely to undergo alternative polyadenylation in lung cancer compared to WAs 7. Furthermore, elevated levels of cytokines, including IL-1β, IL-10, and TNFα, are associated with an increased risk of lung cancer in the AA population 8. The molecular and genetic variations linked to lung cancer disparities offer potential as biomarkers for NSCLC in AAs, which could address the observed ethnic disparities in lung cancer treatment and outcomes 9.
Non-coding RNAs (ncRNAs) are RNA molecules that are not translated into proteins but are essential in regulating gene expression and cellular processes 10. ncRNAs mainly consist of microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and small nucleolar RNAs (snoRNAs), among others11. Aberrant expression of certain ncRNAs has been closely linked to the onset and advancement of cancer, highlighting their potential as both therapeutic targets and diagnostic markers. Furthermore, miRNAs are implicated in the disparities observed between AAs and other populations10, 11. Numerous studies, ours included, have identified miRNAs, lncRNAs, and snoRNAs linked to lung cancer, underscoring their promise as potential biomarkers for this disease 12-37. We postulate that by studying and distinguishing ncRNA patterns in the plasma and sputum of NSCLC patients from both AA and WA cohorts, we can craft noninvasive lung cancer biomarkers tailored to individual ethnicities.
MATERIAL and METHODS
Patients and research design
The study received approval from the Institutional Review Board of the University of Maryland Baltimore. Participants eligible for inclusion were current or former smokers aged between 50 and 80 years old. We excluded those who were pregnant or lactating, those with current pulmonary infections, individuals who had undergone thoracic surgery within the last 6 months, those who had received chest radiotherapy in the preceding year, and patients with a life expectancy of under one year. Demographic and clinical data, including age, sex, race, and smoking history, were collated from medical records. To confirm malignancy, tissue samples obtained either surgically or via biopsy were subject to pathologic examination. Surgical pathologic staging adhered to the TNM classification of the International Union Against Cancer, the American Joint Committee on Cancer, and the International Staging System for Lung Cancer. Histopathologic classification was made based on the guidelines provided by the World Health Organization. Radiographic characteristics of the pulmonary nodules (PNs) were derived from CT images. These included the maximum transverse size, the visually determined type (categorized as nonsolid, ground-glass opacity, part-solid, solid, peri fissure, or spiculation), and the nodule’s lung location. A benign diagnosis was confirmed either pathologically, specifying a benign cause, or through the clinical and radiographic stability of the PNs observed across multiple check-ups over a 2-year follow-up period. 340 participants were enrolled. Among the 340 participants, 174 patients were diagnosed with NSCLC, with an equal distribution of 87 AAs and 87 WA lung cancer patients. The other 166 had benign conditions: 99 had granulomatous inflammation, 38 exhibited nonspecific inflammatory changes, and 29 presented with lung infections. The cohort was bifurcated randomly into a training set and a validation set, detailed in Tables 1 and 2.
Sputum and blood sample collection and preparation
Specimens were obtained from participants prior to any treatment initiation. For sputum collection, participants were instructed to blow their nose, rinse their mouth, and drink water to minimize contamination from oral epithelial cells. The sputum was then gathered in sterile containers and promptly processed on ice using 0.1% dithiothreitol and phosphate-buffered saline (Sigma Aldrich, St. Louis, MO) 12-33. Concurrently, blood samples were drawn, and within an hour of collection, plasma was separated following standard clinical protocols 12-33.
RNA isolation
RNA was extracted from the specimens using the miRNeasy Mini Kit spin column (QIAGEN, Germantown, MD), as previously described 12-33. The extracted RNA samples were promptly stored at -80°C in barcoded cryotubes.
Droplet digital PCR (ddPCR) analysis of miRNAs, lncRNAs, and snoRNAs
Numerous studies, including our own, have pinpointed 93 specific miRNAs, lncRNAs, and snoRNAs in tissue specimens related to lung cancer, suggesting their potential as valuable biomarkers for the disease 12-37 (Supplementary table 1). In this study, we employed ddPCR analysis for the 93 ncRNAs in both plasma and sputum, following the previously developed methods 12-33. One μl of RNA from each sample was reverse transcribed (RT) using gene-specific primers for each target with the TaqMan miRNA RT Kit (Applied Biosystems, Foster City, CA). For the ddPCR reactions, a mixture containing 5 μl cDNA solution, 10 μl Supermix, and 1 μl Taqman primer/probe mix was prepared in a 20 μl volume. This mixture was loaded into cartridges filled with droplet generation oil (Bio-Rad, Hercules, CA) and placed into the QX100 Droplet Generator (Bio-Rad). The droplets formed were then shifted to a 96-well PCR plate, followed by PCR amplification using a T100 thermal cycler (Bio-Rad). The ddPCR method generated over 10,000 droplets per well, which were subsequently analyzed using a fluorescence detector. This ensured that the ncRNAs were consistently and accurately detected in the clinical samples. We assessed the number of positive reactions and employed Poisson’s distribution to accurately determine the concentration of the target genes12-33.
Statistical analysis
Statistical significance for biomarkers and clinical determinants was ascertained using the Mann-Whitney U test or the Chi-Square test. Pearson’s correlation was used to assess the relationship between miRNA expression and clinical and demographic data, including smoking history measured as pack-years. The construction of a lung cancer biomarker panel was strategized by bifurcating the cohort into training and validation subsets, adhering to the guidelines proposed by the National Cancer Institute’s Early Detection Research Network. In the training subset, feature selection was conducted employing the LASSO (Least Absolute Shrinkage and Selection Operator) in tandem with logistic regression. A 10-fold cross-validation reinforced with bootstrapping was used to mitigate outlier influence. The mean decrease in Gini impurity served as the metric for evaluating variable importance, with the False Discovery Rate (FDR) addressing multiple testing corrections. Discrimination metrics were established using the ROC (receiver operating characteristic) curve analysis, reporting AUC (area under the curve) values accompanied by 95% confidence intervals. The confidence intervals for performance metrics (AUC, sensitivity, and specificity) were determined employing an assortment of statistical methodologies. After refining the diagnostic panels from the training set, we assessed their robustness in the validation subset using AUC, sensitivity, and specificity.
RESULTS
Differential expression of ncRNAs in plasma and sputum of NSCLC patients vs. cancer-free smokers
The expression levels of 93 lung cancer-associated ncRNAs12-37, which included 67 miRNAs, 21 snoRNAs, and five lncRNAs (Supplemental Table 1), were quantified in plasma and sputum using a microplate-based ddPCR technique12-35. This analysis was first conducted on the specimens obtained from 59 AA NSCLC patients, 46 cancer-free AA smokers, 59 WA NSCLC patients, and 46 cancer-free WA smokers (Table 1). In plasma, a differential expression of 25 ncRNAs, including 18 miRNAs, five snoRNAs and two lncRNAs, was observed between cancer patients and cancer-free smokers across all ethnicities (Mann-Whitney U test: p < 0.05; FDR-adjusted p < 0.05) (Supplemental Table 2). Similarly, in sputum, eight ncRNAs – comprising three miRNAs, three snoRNAs, and two lncRNAs – exhibited differential expression between NSCLC patients and cancer-free smokers (Mann-Whitney U test: p < 0.05; FDR-adjusted p < 0.05) (Supplemental Table 2).
Differential expression of ncRNAs in plasma and sputum among individuals from different ethnic populations
We further explored the differential expression of ncRNAs in plasma and sputum across individual ethnic groups. In plasma samples from AA participants, seven ncRNAs (miRs-31-5p, 147b, 16-5p, 375-3p, 422a, and 324-3p, and snoRA42) showed a significant increase in expression in lung cancer patients compared to cancer-free AA controls (Mann-Whitney U test: p < 0.05; FDR-adjusted p < 0.05) (Table 3, Figure 1, and Supplemental Figure 1). However, snoRA76 displayed a significant decrease in expression in AA lung cancer patients compared to controls (Mann-Whitney U test: p < 0.05; FDR-adjusted p < 0.05) (Table 3, Figure 1, and Supplemental Figure 1). In the sputum samples from AA participants, four ncRNAs – three microRNAs (miRs-16-5p, 210-3p, and 205-5p) and one snoRNA (snoRA116) – were found to be elevated in lung cancer patients as compared to cancer-free AA controls (Mann-Whitney U test: p < 0.05; FDR-adjusted p < 0.05) (Table 3 and Figure 1A) (Supplemental Figure 1).
In plasma samples from WA participants, 12 ncRNAs showed differential expression between lung cancer patients and cancer-free WA smokers. These ncRNAs include eight microRNAs (miRs-93-5p, 103a-3p, 126-3p, 146b-5p, 205-5p, 944, 4251, and 1285-3p) and three snoRNAs (snoRA3, snoRA21, and snoRA80) (Mann-Whitney U test: p < 0.05; FDR-adjusted p < 0.05) (Table 4 and Figure 1B) (Supplemental Figure 2). Ten of these ncRNAs (miRs-93-5p, 103a-3p, 126-3p, 146b-5p, 205-5p, 944, 1254, and 1285-3p, and snoRA21 and snoRA80) showed increased expression, whereas two ncRNAs (miR-4251 and snoRA3) had decreased expression in WA lung cancer patients when compared to cancer-free AA smokers. In sputum samples from WAs, five ncRNAs demonstrated elevated expression in lung cancer patients compared to cancer-free WA smokers. These ncRNAs include four miRNAs: miR-34a-5p, miR-652-5p, miR-375-3p, and let-7a, along with one snoRNA: snoRA66 (Mann-Whitney U test: p < 0.05; FDR-adjusted p < 0.05) (Table 4, Figure 1, and Supplemental Figure 2).
The diagnostic utility of the plasma and sputum ncRNA biomarkers vary with ethnicity
We utilized logistic regression and a backward elimination approach to identify specific ncRNA biomarker panels for lung cancer in different ethnicities. For AAs, the best prediction for lung cancer was achieved using a combination of three ncRNAs: miRs-147b, 324-3p, and 422a in plasma. This panel yielded an AUC of 0.90, distinguishing AA cancer patients from healthy AAs with a sensitivity of 86% and a specificity of 89% (Fig. 2) (Table 5). For WAs, the optimal prediction derived from a combination of four ncRNAs: sputum miR-34a-5p, plasma miR-103-3p, plasma 126-3p, and plasma 205-5p. This combination achieved an AUC of 0.91, diagnosing NSCLC with a sensitivity of 89% and a specificity of 87% (all p < 0.05) (Table 5). Additionally, for pan-ethnic diagnosis, a panel consisting of plasma miR-21-3p, plasma miR-210-3p, and sputum miR-126-3p demonstrated the best universal diagnostic ability. This combination achieved an AUC of 0.84, with a sensitivity of 71% and a specificity of 88% (Fig. 2) (Table 5). The pan-ethnic biomarker panel exhibited lower sensitivity for AAs and WAs compared to their individual biomarker panels (71% vs. 86% for AAs and 89% for WAs, p<0.05), while maintaining similar specificity (Table 5). Among the ten ncRNA biomarkers, plasma miR-205-5p and sputum miR-126-3p were associated with age, whereas plasma miR-422a and plasma miR-324-3p were related to the patients’ sex (all p-values < 0.05) (Supplement Table 3). Plasma miR-422a was associated with the size of PNs, and plasma miR-147b correlated with tumor stage. The ncRNAs were not linked to smoking history (Supplement Table 3). When these biomarkers were used in combination as panels, their diagnostic values did not show any association with the patients’ age, sex, smoking history, size of PNs, tumor stages, or histological types of lung tumors.
Verifying the diagnostic potential of the biomarker panels for disparities
We validated the three distinct ncRNA biomarker panels for diagnosis of lung cancer in the validation cohort. For AAs, the biomarker panel comprising plasma miR-147b, miR-324-3p, and miR-422a achieved a sensitivity of 86% and a specificity of 89% in detecting lung cancer (Supplementary Table 4). For WAs, the panel including sputum miR-34a-5p, plasma miR-103-3p, plasma 126-3p, and plasma 205-5p demonstrated a sensitivity of 89% and a specificity of 86% (Supplementary Table 4). The ethnicity-neutral biomarker panel, including plasma miR-21-3p, plasma miR-210-3p, and sputum miR-126-5p, achieved a sensitivity of 71% and 89% specificity in the diagnosis of lung cancer across all ethnic groups (Supplementary Table 4). These results in the validation set confirm the findings in the training set and thus support the potential of these biomarkers for early NSCLC detection in different racial populations.
DISCUSSION
The National Lung Screening Trial (NLST) has established that low-dose computed tomography (LDCT) screenings significantly reduce lung cancer-related mortality among high-risk populations, notably smokers 38. LDCT is currently utilized for lung cancer screening in smokers. However, this method has significantly increased the detection of indeterminate pulmonary nodules (PNs) in asymptomatic individuals. Of the smokers screened, 24.2% were found to have indeterminate PNs through LDCT, yet 96.4% of these nodules were subsequently confirmed as benign growths38. Moreover, while the CT screening using LDCT boasts a sensitivity exceeding 90%, its specificity stands at a mere 61%(1), resulting in a substantial false positive rate or overdiagnosis38. Given the notably high incidence and mortality rates among AAs, there is an urgent need for non-invasive molecular biomarkers tailored for the AA demographic. These biomarkers can facilitate early detection of NSCLC either when used independently or in conjunction with LDCT, aiming to reduce the false positive rates frequently associated with LDCT. While the prior investigations have revealed certain miRNA variations in surgically resected lung tumor tissues between AA and WA patients 11, 39, the field still lacks non-invasive molecular biomarkers tailored for early lung cancer detection in the AA demographic.
In this study, we systematically analyzed 93 lung cancer-related ncRNAs from plasma and sputum samples of both AA and WA lung cancer patients, as well as from cancer-free controls. Distinct ncRNA alterations associated with each population were identified, leading to the formulation of specific diagnostic panels for each group. Moreover, we developed an ethnicity-neutral biomarker panel for diagnosing lung cancer. However, this pan-ethnic biomarker panel demonstrated suboptimal diagnostic sensitivity among varied ethnic groups compared to population-specific markers. Furthermore, while some ncRNAs are associated with age, gender, size of PNs, or smoking history, the combined use of these genes as biomarker panels was not influenced by these factors in either population. Interestingly, their diagnostic efficacy remained consistent across early and late stages of lung tumors, underscoring their potential for early NSCLC detection in clinical contexts. Additionally, these biomarkers are not associated with PNs identified via LDCT. Thus, these biomarkers may prove instrumental in distinguishing lung cancer within PNs identified by LDCT, potentially reducing its elevated false-positive rate. Nonetheless, a more extensive study with a broader cohort is essential to further validate this diagnostic potential.
The most discriminatory biomarkers that can diagnose lung cancer among AAs are miRs-147b, 324-3p, and 422a. miR-147b can promote lymph node metastasis and prognosis of cancer through its regulation of PRPF4B, WDR82, and NR3C2 40. Additionally, it influences drug resistance to EGFR inhibitors by modulating the TCA cycle. miR-324-3p is highly present in lung cancer cells and promotes their growth and invasion 41. miR-422a can inhibit the TGF-β/SMAD pathway by downregulating sulfatase 2, and hence constrain NSCLC cell proliferation, migration, invasion, colony formation, EMT and tumorigenesis 42. miR-34a-5p, miR-103-3p, miR-126-3p, and 205-5p stand out in the diagnosis of lung cancer among WAs. miR-34-5p is implicated in the regulation of tumor growth due to its role in the epithelial-mesenchymal transition (EMT) via EMT-transcription factors, p53 and other important signal pathways43. Dysfunction of miR-103-3p is pivotal in lung tumorigenesis as it directly targets PDCD10, influencing lung cancer cell proliferation and metastasis44. The miR-103/PDCD10 signaling pathway offers a potential novel therapeutic target for NSCLC treatment44. miR-126-3p, an endothelial miRNA, is aberrantly expressed in specimens of patients with lung cancer 45. Its reintroduction curbs tumor growth by targeting EGFL7. Elevated expression of miR-205-5p is implicated in the initiation and progression of NSCLC 46.
This microRNA is also associated with the modulation of EMT by targeting EMT-related genes, which in turn affects the invasive and metastatic capabilities of lung cancer cells47. Furthermore, miR-205-5p is believed to contribute to the carcinogenesis and chemoresistance of NSCLC by influencing the PTEN signaling pathway48. Nevertheless, further investigation is needed to fully understand the specific roles and implications of these ncRNAs in accounting for racial disparities in lung cancer incidence.
This study might present valuable insights but also highlights areas that need further exploration. Sample sizes could certainly be expanded to uncover markers that are less discriminatory. Furthermore, while this study focused on analyzing 93 ncRNAs, a myriad of other genes awaits systematic validation in future work. In addition, a longitudinal study is warranted to investigate how these molecules relate to disease pathology and progression over time among the different race populations.
By analyzing surgically resected tissue samples, Mitchell et al. identified seven miRNAs with differing expression levels between AA and WA lung cancer patients 39. These miRNAs have limited similarity to the ones we identified using plasma and sputum samples. Several reasons could account for these discrepancies: the tissues provide localized information, whereas plasma and sputum reflect systemic influences. Inherent tumor variability can affect miRNA expression. Tumors may release specific ncRNAs based on their characteristics, with some remaining localized. Furthermore, variations in laboratory procedures and ncRNA detection methods might yield different results. In addition, genetic and environmental differences within AA and WA groups can influence ncRNA profiles across studies. In response to this discrepancy, we are currently collecting tissue specimens, matched with plasma and sputum samples, from various ethnic populations. This will enable us to concurrently profile ncRNA changes and better understand the relationship between molecular aberrations across these different specimen types.
In sum, the distinctive ncRNA profiles linked to lung cancer in AAs vs. WAs may hold promise as biomarkers to address the observed racial disparity in lung cancer. Nonetheless, a large multi-center clinical trial is needed to prospectively validate the full utility of the biomarkers for early lung cancer detection in the different populations.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Authorship Contribution Statement
Lu Gao: Conceptualization; Data curation; Investigation, Formal analysis; Methodology.
Pushpa Dhilipkannah: Data curation; Investigation, Formal analysis; Methodology.
Van K Holden: Data curation; Investigation, Formal analysis; Methodology. Writing—review and editing.
Janaki Deepak: Data curation; Investigation, Formal analysis; Methodology. Writing—review and editing.
Ashutosh Sachdeva: Data curation; Investigation, Formal analysis; Methodology. Writing—review and editing.
Nevins W Todd: Data curation; Investigation, Formal analysis; Methodology. Writing—review and editing.
Sanford A Stass: Data curation; Investigation, Formal analysis; Methodology. Writing—review and editing.
Feng Jiang: Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Resources; Supervision; Validation; Writing—original draft; Writing—review and editing.
Ethics Approval and Consent to Participate
Ethical permits are in place for all studies and all participants consented to partake.
Consent for Publication
All patients and authors consented to the publication of the results.
Availability of Data and Materials
Data are available from the corresponding author on reasonable request.
Funding
Supported by NCI grant number: UH3 CA251139 (Dr. Feng Jiang).
Disclosures
The authors declare no financial conflicts of interest or other competing interests related to this research.
Supplemental files
Acknowledgment
We extend our sincere gratitude to the Biostatistics Shared Service at the University of Maryland Marlene and Stewart Greenebaum Cancer Center for their invaluable contribution in conducting the statistical analysis for this study.