PT - JOURNAL ARTICLE AU - Wenye Geng AU - Xuanfeng Qin AU - Zhuo Wang AU - Qing Kong AU - Zihui Tang AU - Lin Jiang TI - Model-based reasoning methods for diagnosis in integrative medicine based on electronic medical records and natural language processing AID - 10.1101/2020.07.12.20151746 DP - 2020 Jan 01 TA - medRxiv PG - 2020.07.12.20151746 4099 - http://medrxiv.org/content/early/2020/07/18/2020.07.12.20151746.short 4100 - http://medrxiv.org/content/early/2020/07/18/2020.07.12.20151746.full AB - Background This study aimed to investigate model-based reasoning (MBR) algorithms for the diagnosis of integrative medicine based on electronic medical records (EMRs) and natural language processing.Methods A total of 14,075 medical records of clinical cases were extracted from the EMRs as the development dataset, and an external test dataset consisting of 1,000 medical records of clinical cases was extracted from independent EMRs. MBR methods based on word embedding, machine learning, and deep learning algorithms were developed for the automatic diagnosis of syndrome pattern in integrative medicine. MBR algorithms combining rule-based reasoning (RBR) were also developed. A standard evaluation metrics consisting of accuracy, precision, recall, and F1 score were used for the performance estimation of the methods. The association analyses were conducted on the sample size, number of syndrome pattern type, and diagnosis of lung diseases with the best algorithms.Results The Word2Vec CNN MBR algorithms showed high performance (accuracy of 0.9586 in the test dataset) in the syndrome pattern diagnosis. The Word2Vec CNN MBR combined with RBR also showed high performance (accuracy of 0.9229 in the test dataset). The diagnosis of lung diseases could enhance the performance of the Word2Vec CNN MBR algorithms. Each group sample size and syndrome pattern type affected the performance of these algorithms.Conclusion The MBR methods based on Word2Vec and CNN showed high performance in the syndrome pattern diagnosis in integrative medicine in lung diseases. The parameters of each group sample size, syndrome pattern type, and diagnosis of lung diseases were associated with the performance of the methods.Strengths and limitations of this studyA novel application of artificial intelligence – natural language processing approaches on diagnosis of integrative medicineA study of medical artificial intelligence based on real-world data of electronic medical recordsMultiple approaches on artificial intelligence to include traditional machine learning algorithms, neural network, and deep learning algorithmsRule-based combining model-based reasoning to be explored in this datasetCompeting Interest StatementThe authors have declared no competing interest.Clinical TrialNCT03274908Funding StatementGrants from the Institutes of Integrative Medicine of Fudan University. ClinicalTrials.gov Identifier: NCT03274908; and China Postdoctoral Science Foundation funded project (2017M611461).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The study was approved by Ethics Committee of the Huashan Hospital (approval number: HIRB-2018-166) and performed in accordance with the Declaration of Helsinki.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe datasets generated and/or analyzed during the current study are not publicly available due to private information but are available from the corresponding author on reasonable request. Dataset are from the study whose authors may be contacted at Center of Bioinformatics and Biostatistics, Institutes of Integrative Medicine, Fudan University. The data concerning external test dataset and an example of development of dataset were available in https://github.com/zihuitang/clincial_decision_support_system_im . https://github.com/zihuitang/clincial_decision_support_system_im ANNArtificial neural networkCIConfidence intervalCNNConvolutional neural networkEMRsElectronic medical recordsXGBoostExtreme gradient boostingKNNK-nearest neighborMBRModel-based reasoningMLPMultilayer perceptronNLPNatural language processingRFRandom forestRBRRule-based reasoningSVMSupport vector machinesTCMTraditional Chinese medicine