Abstract
Pancreatic ductal adenocarcinoma (PDAC) is largely incurable due to late diagnosis and absence of markers that are concordant with expression in several sample sources (i.e. tissue, blood, plasma) and platform (i.e. Microarray, sequencing). We optimized meta-analysis of 19 PDAC (tissue and blood) transcriptome studies from multiple platforms. The key biomarkers for PDAC diagnosis with secretory potential were identified and validated in different cohorts. Machine learning approach i.e. support vector machine supported by leave-one-out cross-validation was used to build and test the classifier. We identified a 9-gene panel (IFI27, ITGB5, CTSD, EFNA4, GGH, PLBD1, HTATIP2, IL1R2, CTSA) that achieved ∼0.92 average sensitivity and ∼0.90 specificity in discriminating PDAC from non-tumor samples in five training-sets on cross-validation. This classifier accurately discriminated PDAC from chronic-pancreatitis (AUC=0.95), early stages of progression (Stage I and II (AUC=0.82), IPMA and IPMN (AUC=1), IPMC (AUC=0.81)). The 9-gene marker outperformed the previously known markers in blood studies particularly (AUC=0.84). The discrimination of PDAC from early precursor lesions in non-malignant tissue (AUC>0.81) and peripheral blood (AUC>0.80) may facilitate early blood-diagnosis and risk stratification upon validation in prospective clinical-trials. Furthermore, the validation of these markers in proteomics and single-cell transcriptomics studies suggest their prognostic role in the diagnosis of PDAC.
Competing Interest Statement
BIDMC will be filling patent on behalf of MB and IK on the use of biomarker panel for early PDAC diagnosis. MB is an equity holder at BiomaRx and Canomiks.
Funding Statement
This study was supported through BIDMC CAO Innovation grant.
Author Declarations
All relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.
Yes
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The datasets used and/or analysed during the current study are available in public repositories GEO and ArrayExpress. The codes and DE genes per dataset will be available via GitHub (https://github.com/IKhatri-Git/Secretory-gene-classifier).
Abbreviations
- AUC
- area under the curve
- CA 19-9
- Carbohydrate antigen 19-9
- CP
- chronic pancreatitis
- GEO
- gene expression omnibus
- GGH
- γ-glutamyl hydrolase
- HPA
- Human Protein Atlas
- IPMA
- intraductal papillary-mucinous adenoma
- IPMC
- intraductal papillary-mucinous carcinoma
- IPMN
- intraductal papillary mucinous neoplasm
- LOOCV
- leave-one-out cross-validation
- noTM
- no transmembrane segments
- PanIN
- pancreatic intraepithelial neoplasia
- PC
- pancreatic cancer
- PDAC
- Pancreatic ductal adenocarcinoma
- ROC
- receiver operating characteristic
- SVM
- support vector machines
- TCGA
- tissue cancer genome atlas