PT  - JOURNAL ARTICLE
AU  - Zong, Hui
AU  - Li, Jiakun
AU  - Wu, Erman
AU  - Wu, Rongrong
AU  - Shen, Bairong
TI  - Performance of ChatGPT on Chinese National Medical Licensing Examinations: A Five-Year Examination Evaluation Study for Physicians, Pharmacists and Nurses
AID  - 10.1101/2023.07.09.23292415
DP  - 2023 Jan 01
TA  - medRxiv
PG  - 2023.07.09.23292415
4099  - http://medrxiv.org/content/early/2023/07/09/2023.07.09.23292415.short
4100  - http://medrxiv.org/content/early/2023/07/09/2023.07.09.23292415.full
AB  - Background Large language models like ChatGPT have revolutionized the field of natural language processing with their capability to comprehend and generate textual content, showing great potential to play a role in medical education.Objective This study aimed to quantitatively evaluate the performance of ChatGPT on three types of national medical examinations in China, including National Medical Licensing Examination (NMLE), National Pharmacist Licensing Examination (NPLE), and National Nurse Licensing Examination (NNLE).Methods We collected questions from Chinese NLMLE, NPLE and NNLE from year 2017 to 2021. In NMLE and NPLE, each exam consists of 4 units, while in NNLE, each exam consists of 2 units. The questions with figures, tables or chemical structure were manually identified and excluded by clinician. We applied direct instruction strategy via multiple prompts to force ChatGPT to generate the clear answer with the capability to distinguish between single-choice and multiple-choice questions.Results ChatGPT failed to pass the threshold score (0.6) in any of the three types of examinations over the five years. Specifically, in the NMLE, the highest recorded score was 0.5467, which was attained in both 2018 and 2021. In the NPLE, the highest score was 0.5599 in 2017. In the NNLE, the most impressive result was shown in 2017, with a score of 0.5897, which is also the highest score in our entire evaluation. ChatGPT’s performance showed no significant difference in different units, but significant difference in different question types.Conclusions These results indicate ChatGPT failed the NMLE, NPLE and NNLE in China, spanning from year 2017 to 2021. but show great potential of large language models in medical education. In the future high-quality medical data will be required to improve the performance.Competing Interest StatementThe authors have declared no competing interest.Funding StatementNational Natural Science Foundation of ChinaAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors