RT Journal Article SR Electronic T1 GPAS: an online AI system for rapid and accurate pathogen identification and LLM-based interpretation JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2026.02.18.26346517 DO 10.64898/2026.02.18.26346517 A1 Li, Tingting A1 Hong, Hao A1 Fan, Duchangjiang A1 Li, Jin A1 Li, Ting A1 Wu, Jiaqi A1 Jiang, Shuai A1 Xie, Xianxing A1 Zhang, Yawei A1 Hu, ManDong A1 Yin, Xiaoyao A1 Zhang, Yizhe A1 Ma, Heping A1 Liu, Zhehan A1 Su, Zhihui A1 Yu, Xiping A1 Liu, Yu A1 Yuan, Hetian A1 Zheng, Weifan A1 Liu, Haoyuan A1 Ma, Mingyue A1 Li, Xingyue A1 Shen, Yezhuang A1 Zhang, Cheng A1 Wang, Yuyi A1 Zhao, Bing A1 Sun, Liming A1 Han, Qiuying A1 Chen, Jing A1 Zhang, Ke A1 Chen, Liang A1 Wang, Na A1 Li, Weihua A1 Man, Jianghong A1 He, Kun A1 Dong, Fangting A1 Du, Fei A1 Yi, Yan A1 Li, Ailing A1 Zhou, Tao A1 Zhang, Xuemin A1 Li, Tao YR 2026 UL http://medrxiv.org/content/early/2026/02/20/2026.02.18.26346517.abstract AB Accurate identification of unknown pathogens is critical for medicine and public health, yet current metagenomic workflows remain heavily dependent on specialized bioinformatics expertise and manual interpretation, creating substantial bottlenecks in time-sensitive diagnostic settings1. The key challenges lie in achieving precise species identification amidst high background noise and translating complex microbial data into clinically actionable insights2,3. Here we present the Global Pathogen Analysis System (GPAS), an integrated computational framework that combines rapid and accurate pathogen identification with large language model (LLM)-based semantic interpretation. Central to GPAS is a dynamic-library alignment mechanism informed by prior probabilities of inter-species misclassification. By integrating a hybrid machine learning model that couples elastic neural networks with Bayesian inference, this approach substantially reduces both false positives and false negatives, achieving species-level accuracy superior to existing state-of-the-art tools. To enable clinical interpretation, we constructed a unified microbial knowledge graph integrating global metagenomic and metaviromic sample repositories, and trained a pathogen-specialized LLM agent. Through end-to-end reinforcement learning, the agent autonomously executes multi-step reasoning workflows extracting pathogen-specific insights from complex data and generating human-readable, evidence-based reports. Application to throat swab samples demonstrates that GPAS not only accurately identifies pathogenic microorganisms but also reveals how SLE-associated immune dysregulation reshapes the respiratory microbiome and promotes pathobiont overgrowth, providing clinically instructive interpretations. By substantially lowering technical barriers to pathogen identification, GPAS offers an accessible yet powerful platform for clinical diagnostics, public health surveillance, and microbiome research. The system is freely available at: https://gpas.nh.ac.cn/.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by State Key Laboratory of Biomedical Analysis and grants from the China National Natural Science Foundation (No. 82550131, No. 81925017 and No. 82130052 to Tao Li, No. 62503496 to Hao Hong).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Samples from participants were collected in accordance with medical ethics requirements, under the approval of the Ethics Committee of National Center of Biomedical Analysis (NCBA, Approval No. AF/SC-08/02.240N and AF/SC-08/02.453N).I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors