Abstract
Purpose Although statistical models have been employed to detect and classify lung nodules using deep learning-extracted and clinical features, there is a lack of model validation in independent, multinational datasets from computed tomography (CT) scans and patient clinical information. To this end, we developed a deep learning-based algorithm to predict the malignancy of pulmonary nodules and validated its performance in three independent datasets containing multiracial and multinational populations.
Methods In this study, a convolutional neural network-based algorithm to predict lung nodule malignancy was built based on CT scans and patient-wise clinical features (i.e. sex, spiculation, and nodule location). The model consists of three steps: (1) a deep learning algorithm to automatically extract features from CT scans, (2) clinical features were concatenated with the nodule features after dimension reduction by the principal component analysis (PCA), and (3) a multivariate logistic regression model was employed to classify the malignancy of the lung nodules. The model was trained by a dataset containing 1,556 nodules from 813 patients from the National Lung Screening Trial (NLST). The performance of the model was evaluated on three independent, multi-institutional datasets LIDC and Infervision Multi-Center (IMC) dataset, which contains 562 nodules from 293 patients, and 2044 nodules from 589 patients, respectively. The model accuracy was measured by the area under curve (AUC) of receiver operating characteristic (ROC) analysis.
Results The study shows that the AUCs of ROCs on the NLST dataset, LIDC dataset, and IMC dataset are 0.91, 0.86, and 0.95, respectively. The inclusion of clinical features does not significantly improve the model performance. Quantitatively, the summed-up weight on the prediction accuracy of the 10 nodule features extracted by the deep learning algorithm equals to 0.091, while the weight of patient sex, nodule spiculation, and location is 0.031, 0.052, and 0.008, respectively.
Conclusion The convolutional neural network-based model for lung nodule classification could be generalized to multiple datasets containing diverse populations. The addition of three patient clinical features to the nodule features extracted by deep learning does not boost the performance of the model.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
No external funding was received for this study.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee. For the NLST and LIDC cohorts, ethics approval was obtained from each participating study center, and written consent was provided by all participants. Data collection and usage were approved by the National Cancer Institute. For the Infervision Multi-Center cohort, the original data collection and research studies were approved by the institutional review boards of Dalian Zhongshan Hospital, Affiliated Hospital of Shaanxi University of Traditional Chinese Medicine, Jiangsu University Affiliated Hospital, Fujian Medical University Affiliated Union Hospital, Wuhan Tongji Hospital, and Shengjing Hospital.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data will be available per request.