Abstract
Thousands of scientific articles describing genes associated with human diseases are published every week. Computational methods such as text mining and machine learning algorithms are now able to automatically detect these associations. In this study, we used a cognitive computing text-mining application to construct a knowledge network comprised of 3,723 genes and 99 diseases. We then tracked the yearly changes on these networks to analyze how our knowledge has evolved in the past 30 years. Our approach helped to unravel the molecular bases of diseases over time, and to detect shared mechanisms between clinically distinct diseases. It also revealed that multi-purpose therapeutic drugs target genes which are commonly associated with several psychiatric, inflammatory, or infectious disorders. By navigating in this knowledge tsunami, we were able to extract relevant biological information and insights about human diseases.
Competing Interest Statement
The authors have declared no competing interest.
Clinical Trial
Not applicable
Funding Statement
This work was supported by Brazilian National Council for Scientific and Technological Development (grant numbers 313662/2017-7); the Sao Paulo Research Foundation (grant numbers 2018/14933-2).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Not applicable
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The data and code used to produce the analyses and figures in this study are available at https://github.com/csbl-usp/evolution_of_knowledge.