TY - JOUR T1 - Graph Neural Network Modelling as a potentially effective Method for predicting and analyzing Procedures based on Patient Diagnoses JF - medRxiv DO - 10.1101/2021.11.25.21266465 SP - 2021.11.25.21266465 AU - Juan G. Diaz Ochoa AU - Faizan Mustafa Y1 - 2021/01/01 UR - http://medrxiv.org/content/early/2021/12/02/2021.11.25.21266465.abstract N2 - Background Currently, the healthcare sector strives to increase the quality of patient management and improve the economic performance of healthcare providers. The data contained in electronic health records (EHRs) offer the potential to discover relevant patterns that aim to relate diseases and therapies, and thus discover patterns that could help identify empirical medical guidelines that reflect best practices in the healthcare system. Based on this pattern identification, it is then possible to implement recommendation systems based on the idea that a higher volume of procedures is associated with high-quality models.Methods Although there are several applications that use machine learning methods to identify these patterns, this identification is still a challenge, in part because these methods often ignore the basic structure of the population, considering the similarity of diagnoses and patient typology. To this end, we have developed graph methods that aim to cluster similar patients. In such models, patients are linked when the same or similar patterns can be observed for these patients, a concept that enables the construction of a network-like structure. This structure can then be analyzed with Graph Neural Networks (GNN) to identify relevant labels, in this case the appropriate medical procedures.Results We report the construction of a patient Graph structure based on basic patient’s information like age and gender as well as the diagnoses and trained GNNs models to identify the corresponding patient’s therapies using a synthetic patient database. We compared our GNN models against different baseline models (using the SCIKIT-learn library of python) and compared the performance of the different model methods. We have found that GNNs are superior, with an average improvement of the f1 score of 6.48% respect to the baseline models. In addition, the GNNs are useful for performing additional clustering analyses that allow specific identification of specific therapeutic clusters related to a particular combination of diagnoses.Conclusions We found that GNNs are a promising way to model the distribution of diagnoses in a patient population and thus better model how similar patients can be identified based on the combination of morbidities and comorbidities. Nevertheless, network building is still challenging and prone to prejudice, as it depends on how ICD distribution affects the patient network embedding space. This network setup requires not only a high quality of the underlying diagnostic ecosystem, but also a good understanding of how to identify related patients by disease. For this reason, additional work is needed to improve and better standardize patient embedding in graph structures for future investigations and applications of services based on this technology, and therefore is not yet an interventional study.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive any fundingAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present study are available upon reasonable request to the authorsGNNGraph Neural NetworksHERElectronic Health RecordPHIPatient Health InformationIQIInpatient Quality IndicatorsPQIPrevention Quality IndicatorsPSIPatient Safety IndicatorsTKTherapy Keys (equivalent to medical procedures)ICDInternational Classification of DiseasesIDInternal Patient identification ER -