Abstract
An early biomarker would transform our ability to screen and treat patients with cancer. The large amount of multi-scale molecular data in public repositories from various cancers provide unprecedented opportunities to find such a biomarker. However, despite identification of numerous molecular biomarkers using these public data, fewer than 1% have proven robust enough to translate into clinical practice1. One of the most important factors affecting the successful translation to clinical practice is lack of real-world patient population heterogeneity in the discovery process. Almost all biomarker studies analyze only a single cohort of patients with the same cancer using a single modality. Recent studies in other diseases have demonstrated the advantage of leveraging biological and technical heterogeneity across multiple independent cohorts to identify robust disease biomarkers. Here we analyzed 17149 samples from patients with one of 23 cancers that were profiled using either DNA methylation, bulk and single-cell gene expression, or protein expression in tumor and serum. First, we analyzed DNA methylation profiles of 9855 samples across 23 cancers from The Cancer Genome Atlas (TCGA). We then examined the gene expression profile of the most significantly hypomethylated gene, KRT8, in 6781 samples from 57 independent microarray datasets from NCBI GEO. KRT8 was significantly over-expressed across cancers except colon cancer (summary effect size=1.05; p < 0.0001). Further, single-cell RNAseq analysis of 7447 single cells from lung tumors showed that genes that significantly correlated with KRT8 (p < 0.05) were involved in p53-related pathways. Immunohistochemistry in tumor biopsies from 294 patients with lung cancer showed that high protein expression of KRT8 is a prognostic marker of poor survival (HR = 1.73, p = 0.01). Finally, detectable KRT8 in serum as measured by ELISA distinguished patients with pancreatic cancer from healthy controls with an AUROC=0.94. In summary, our analysis demonstrates that KRT8 is (1) differentially expressed in several cancers across all molecular modalities and (2) may be useful as a biomarker to identify patients that should be further tested for cancer.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported in part by the Bill and Melinda Gates Foundation, and grants RO1 AI125197-01, U19AI109662, and U19AI057229 from the National Institute for Allergy and Infectious Diseases to P.K.; grant F30 HL149252-01A1 from the National Heart, Lung, and Blood Institute to M.K.D.S.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All aspects of this study were approved by the Stanford Institutional Review Board in accordance with the Declaration of Helsinki guidelines for the ethical conduct of research. The reference number for the approval is IRB-20170. A waiver of informed consent was obtained for the subjects in this study according to Stanford's Institutional Review Board policy since this was a retrospective study of both alive and deceased patients, many of whom were lost to follow-up.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Microarray data are available from the NCBI GEO at: https://www.ncbi.nlm.nih.gov/geo/. The accession numbers and corresponding links for the individual studies are listed in Supplemental Table 6.