TY - JOUR T1 - Two-stage biologically interpretable neural-network models for liver cancer prognosis prediction using histopathology and transcriptomic data JF - medRxiv DO - 10.1101/2020.01.25.20016832 SP - 2020.01.25.20016832 AU - Zhucheng Zhan AU - Zheng Jing AU - Bing He AU - Noshad Hosseni AU - Maria Westerhoff AU - Eun-Young Choi AU - Lana X. Garmire Y1 - 2020/01/01 UR - http://medrxiv.org/content/early/2020/08/05/2020.01.25.20016832.abstract N2 - Purpose Pathological images are easily accessible data with the potential as prognostic biomarkers. Moreover, integration of heterogeneous data types from multi-modality, such as pathological image and gene expression data, is invaluable to help predicting cancer patient survival. However, the analytical challenges are significant.Experimental Design Here we take the hepatocellular carcinoma (HCC) pathological image features extracted by CellProfiler, and apply them as the input for Cox-nnet, a neural network-based prognosis. We compare this model with conventional Cox-PH models, using C-index and log ranked p-values on HCC testing samples. Further, to integrate pathological image and gene expression data of the same patients, we innovatively construct a two-stage Cox-nnet model, and compare it with another complex neural network model PAGE-Net.Results pathological image based prognosis prediction using Cox-nnet (median C-index 0.74 and log-rank p-value 4e-6) is significantly more accurate than Cox-PH model (median C-index 0.72 and log-rank p-value of 3e-4). Moreover, the two-stage Cox-nnet complex model combining histopathology image and transcriptomics RNA-Seq data achieves better prognosis prediction, with a median C-index of 0.75 and log-rank p-value of 6e-7 in the testing datasets. The results are much more accurate than PAGE-Net, a CNN based complex model (median C-index of 0.67 and log-rank p-value of 0.02). Imaging features present additional predictive information to gene expression features, as the combined model is much more accurate than the model with gene expression alone (median C-index 0.70). Pathological image features are modestly correlated with gene expression. Genes having correlations to top imaging features have known associations with HCC patient survival and morphogenesis of liver tissue.Conclusion This work provides two-stage Cox-nnet, a new class of biologically relevant and relatively interpretable models, to integrate multi-modal and multiple types of data for survival prediction.Competing Interest StatementThe authors have declared no competing interest.Clinical TrialThis study is a secondary data analysis on public pathology and genomics dataset in TCGA.Funding StatementThe work is support by grants K01ES025434 awarded by NIEHS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov), R01 LM012373 and LM012907 awarded by NLM, R01 HD084633 awarded by NICHD to L.X. Garmire.Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data used in this study is publicly available.gs://gdc-tcga-phs000178-open/ ER -