{"markup":"\u003C?xml version=\u00221.0\u0022 encoding=\u0022UTF-8\u0022 ?\u003E\n    \u003Chtml version=\u0022HTML+RDFa+MathML 1.1\u0022\n    xmlns:content=\u0022http:\/\/purl.org\/rss\/1.0\/modules\/content\/\u0022\n    xmlns:dc=\u0022http:\/\/purl.org\/dc\/terms\/\u0022\n    xmlns:foaf=\u0022http:\/\/xmlns.com\/foaf\/0.1\/\u0022\n    xmlns:og=\u0022http:\/\/ogp.me\/ns#\u0022\n    xmlns:rdfs=\u0022http:\/\/www.w3.org\/2000\/01\/rdf-schema#\u0022\n    xmlns:sioc=\u0022http:\/\/rdfs.org\/sioc\/ns#\u0022\n    xmlns:sioct=\u0022http:\/\/rdfs.org\/sioc\/types#\u0022\n    xmlns:skos=\u0022http:\/\/www.w3.org\/2004\/02\/skos\/core#\u0022\n    xmlns:xsd=\u0022http:\/\/www.w3.org\/2001\/XMLSchema#\u0022\n    xmlns:mml=\u0022http:\/\/www.w3.org\/1998\/Math\/MathML\u0022\u003E\n  \u003Chead\u003E\u003Cscript type=\u0022text\/javascript\u0022 src=\u0022\/\/cdn.jsdelivr.net\/qtip2\/2.2.1\/jquery.qtip.min.js\u0022\u003E\u003C\/script\u003E\n\u003Cscript type=\u0022text\/javascript\u0022 src=\u0022https:\/\/www.medrxiv.org\/sites\/default\/files\/js\/js_YjAJQgxDlFX6S-O02jj9jCrVbrwlY3CGgCg1FzPlvBs.js\u0022\u003E\u003C\/script\u003E\n\u003Cscript type=\u0022text\/javascript\u0022\u003E\n\u003C!--\/\/--\u003E\u003C![CDATA[\/\/\u003E\u003C!--\nif(typeof window.MathJax === \u0022undefined\u0022) window.MathJax = { menuSettings: { zoom: \u0022Click\u0022 } };\n\/\/--\u003E\u003C!]]\u003E\n\u003C\/script\u003E\n\u003Cscript type=\u0022text\/javascript\u0022 src=\u0022https:\/\/www.medrxiv.org\/sites\/default\/files\/js\/js_waP91NpgGpectm_6Y2XDEauLJ8WCSCBKmmA87unpp2E.js\u0022\u003E\u003C\/script\u003E\n\u003Cscript type=\u0022text\/javascript\u0022 src=\u0022https:\/\/www.googletagmanager.com\/gtag\/js?id=G-0K57TCX5BY\u0022\u003E\u003C\/script\u003E\n\u003Cscript type=\u0022text\/javascript\u0022\u003E\n\u003C!--\/\/--\u003E\u003C![CDATA[\/\/\u003E\u003C!--\nwindow.dataLayer = window.dataLayer || [];function gtag(){dataLayer.push(arguments)};gtag(\u0022js\u0022, new Date());gtag(\u0022set\u0022, \u0022developer_id.dMDhkMT\u0022, true);gtag(\u0022config\u0022, \u0022G-0K57TCX5BY\u0022, {\u0022groups\u0022:\u0022default\u0022,\u0022anonymize_ip\u0022:true});\n\/\/--\u003E\u003C!]]\u003E\n\u003C\/script\u003E\n\u003Cscript type=\u0022text\/javascript\u0022\u003E\n\u003C!--\/\/--\u003E\u003C![CDATA[\/\/\u003E\u003C!--\njQuery.extend(Drupal.settings, {\u0022basePath\u0022:\u0022\\\/\u0022,\u0022pathPrefix\u0022:\u0022\u0022,\u0022highwire\u0022:{\u0022ac\u0022:{\u0022\\\/medrxiv\\\/early\\\/2025\\\/10\\\/17\\\/2025.10.14.25338040.atom\u0022:{\u0022access\u0022:{\u0022full\u0022:true},\u0022pisa_id\u0022:\u0022\u0022,\u0022apath\u0022:\u0022\\\/medrxiv\\\/early\\\/2025\\\/10\\\/17\\\/2025.10.14.25338040.atom\u0022,\u0022jcode\u0022:\u0022medrxiv\u0022},\u0022medrxiv;2025.10.14.25338040v1\u0022:{\u0022access\u0022:{\u0022full\u0022:true},\u0022pisa_id\u0022:\u0022medrxiv;2025.10.14.25338040v1\u0022,\u0022apath\u0022:\u0022\u0022,\u0022jcode\u0022:\u0022medrxiv\u0022}},\u0022processed\u0022:[\u0022highwire_math\u0022],\u0022markup\u0022:[{\u0022requested\u0022:\u0022abstract\u0022,\u0022variant\u0022:\u0022abstract\u0022,\u0022view\u0022:\u0022abstract\u0022,\u0022pisa\u0022:\u0022medrxiv;2025.10.14.25338040v1\u0022}]},\u0022instances\u0022:\u0022{\\u0022highwire_abstract_tooltip\\u0022:{\\u0022content\\u0022:{\\u0022text\\u0022:\\u0022\\u0022},\\u0022style\\u0022:{\\u0022tip\\u0022:{\\u0022width\\u0022:20,\\u0022height\\u0022:20,\\u0022border\\u0022:1,\\u0022offset\\u0022:0,\\u0022corner\\u0022:true},\\u0022classes\\u0022:\\u0022qtip-custom hw-tooltip hw-abstract-tooltip qtip-shadow qtip-rounded\\u0022,\\u0022classes_custom\\u0022:\\u0022hw-tooltip hw-abstract-tooltip\\u0022},\\u0022position\\u0022:{\\u0022at\\u0022:\\u0022right center\\u0022,\\u0022my\\u0022:\\u0022left center\\u0022,\\u0022viewport\\u0022:true,\\u0022adjust\\u0022:{\\u0022method\\u0022:\\u0022shift\\u0022}},\\u0022show\\u0022:{\\u0022event\\u0022:\\u0022mouseenter click \\u0022,\\u0022solo\\u0022:true},\\u0022hide\\u0022:{\\u0022event\\u0022:\\u0022mouseleave \\u0022,\\u0022fixed\\u0022:1,\\u0022delay\\u0022:\\u0022100\\u0022}},\\u0022highwire_author_tooltip\\u0022:{\\u0022content\\u0022:{\\u0022text\\u0022:\\u0022\\u0022},\\u0022style\\u0022:{\\u0022tip\\u0022:{\\u0022width\\u0022:15,\\u0022height\\u0022:15,\\u0022border\\u0022:1,\\u0022offset\\u0022:0,\\u0022corner\\u0022:true},\\u0022classes\\u0022:\\u0022qtip-custom hw-tooltip hw-author-tooltip qtip-shadow qtip-rounded\\u0022,\\u0022classes_custom\\u0022:\\u0022hw-tooltip hw-author-tooltip\\u0022},\\u0022position\\u0022:{\\u0022at\\u0022:\\u0022top center\\u0022,\\u0022my\\u0022:\\u0022bottom center\\u0022,\\u0022viewport\\u0022:true,\\u0022adjust\\u0022:{\\u0022method\\u0022:\\u0022\\u0022}},\\u0022show\\u0022:{\\u0022event\\u0022:\\u0022mouseenter \\u0022,\\u0022solo\\u0022:true},\\u0022hide\\u0022:{\\u0022event\\u0022:\\u0022mouseleave \\u0022,\\u0022fixed\\u0022:1,\\u0022delay\\u0022:\\u0022100\\u0022}},\\u0022highwire_reflinks_tooltip\\u0022:{\\u0022content\\u0022:{\\u0022text\\u0022:\\u0022\\u0022},\\u0022style\\u0022:{\\u0022tip\\u0022:{\\u0022width\\u0022:15,\\u0022height\\u0022:15,\\u0022border\\u0022:1,\\u0022mimic\\u0022:\\u0022top center\\u0022,\\u0022offset\\u0022:0,\\u0022corner\\u0022:true},\\u0022classes\\u0022:\\u0022qtip-custom hw-tooltip hw-ref-link-tooltip qtip-shadow qtip-rounded\\u0022,\\u0022classes_custom\\u0022:\\u0022hw-tooltip hw-ref-link-tooltip\\u0022},\\u0022position\\u0022:{\\u0022at\\u0022:\\u0022bottom left\\u0022,\\u0022my\\u0022:\\u0022top left\\u0022,\\u0022viewport\\u0022:true,\\u0022adjust\\u0022:{\\u0022method\\u0022:\\u0022flip\\u0022}},\\u0022show\\u0022:{\\u0022event\\u0022:\\u0022mouseenter \\u0022,\\u0022solo\\u0022:true},\\u0022hide\\u0022:{\\u0022event\\u0022:\\u0022mouseleave \\u0022,\\u0022fixed\\u0022:1,\\u0022delay\\u0022:\\u0022100\\u0022}}}\u0022,\u0022qtipDebug\u0022:\u0022{\\u0022leaveElement\\u0022:0}\u0022,\u0022googleanalytics\u0022:{\u0022account\u0022:[\u0022G-0K57TCX5BY\u0022],\u0022trackOutbound\u0022:1,\u0022trackMailto\u0022:1,\u0022trackDownload\u0022:1,\u0022trackDownloadExtensions\u0022:\u00227z|aac|arc|arj|asf|asx|avi|bin|csv|doc(x|m)?|dot(x|m)?|exe|flv|gif|gz|gzip|hqx|jar|jpe?g|js|mp(2|3|4|e?g)|mov(ie)?|msi|msp|pdf|phps|png|ppt(x|m)?|pot(x|m)?|pps(x|m)?|ppam|sld(x|m)?|thmx|qtm?|ra(m|r)?|sea|sit|tar|tgz|torrent|txt|wav|wma|wmv|wpd|xls(x|m|b)?|xlt(x|m)|xlam|xml|z|zip\u0022,\u0022trackColorbox\u0022:1},\u0022ajaxPageState\u0022:{\u0022js\u0022:{\u0022\\\/\\\/cdn.jsdelivr.net\\\/qtip2\\\/2.2.1\\\/jquery.qtip.min.js\u0022:1,\u0022sites\\\/all\\\/modules\\\/highwire\\\/highwire\\\/plugins\\\/highwire_markup_process\\\/js\\\/highwire_article_reference_popup.js\u0022:1,\u0022sites\\\/all\\\/modules\\\/highwire\\\/highwire\\\/plugins\\\/highwire_markup_process\\\/js\\\/highwire_at_symbol.js\u0022:1,\u00220\u0022:1,\u0022sites\\\/all\\\/modules\\\/contrib\\\/google_analytics\\\/googleanalytics.js\u0022:1,\u0022https:\\\/\\\/www.googletagmanager.com\\\/gtag\\\/js?id=G-0K57TCX5BY\u0022:1,\u00221\u0022:1}}});\n\/\/--\u003E\u003C!]]\u003E\n\u003C\/script\u003E\n\u003Clink type=\u0022text\/css\u0022 rel=\u0022stylesheet\u0022 href=\u0022https:\/\/www.medrxiv.org\/sites\/default\/files\/advagg_css\/css__uXgUByez87OKDsgffPHe7u5qNUzr7zOnqWrSJ87THKk__I8zmferlWQG1DHWX_fZmeyRd733gqStwZcOGe0mM0T4__QrrGUc7CpljPR5Aph-ukPbcwtK4AWrHGwCEXJ_k1V_c.css\u0022 media=\u0022all\u0022 \/\u003E\n\u003Clink type=\u0022text\/css\u0022 rel=\u0022stylesheet\u0022 href=\u0022\/\/cdn.jsdelivr.net\/qtip2\/2.2.1\/jquery.qtip.min.css\u0022 media=\u0022all\u0022 \/\u003E\n\u003Clink type=\u0022text\/css\u0022 rel=\u0022stylesheet\u0022 href=\u0022https:\/\/www.medrxiv.org\/sites\/default\/files\/advagg_css\/css__HGACIFBlu2o05y3afvqlt5wrE_5Dn6MXsexfuEpeIwg__t4SOPxucAPoV3Os7g8dXqyMB1HRXQridRJ82X7nE33E__QrrGUc7CpljPR5Aph-ukPbcwtK4AWrHGwCEXJ_k1V_c.css\u0022 media=\u0022all\u0022 \/\u003E\n\u003Clink rel=\u0027stylesheet\u0027 type=\u0027text\/css\u0027 href=\u0027\/sites\/all\/modules\/contrib\/panels\/plugins\/layouts\/onecol\/onecol.css\u0027 \/\u003E\u003C\/head\u003E\u003Cbody\u003E\u003Cdiv class=\u0022panels-ajax-tab-panel panels-ajax-tab-panel-biorxiv-tab-art\u0022\u003E\u003Cdiv class=\u0022panel-display panel-1col clearfix\u0022 \u003E\n  \u003Cdiv class=\u0022panel-panel panel-col\u0022\u003E\n    \u003Cdiv\u003E\u003Cdiv class=\u0022panel-pane pane-highwire-markup\u0022 \u003E\n  \n      \n  \n  \u003Cdiv class=\u0022pane-content\u0022\u003E\n    \u003Cdiv class=\u0022highwire-markup\u0022\u003E\u003Cdiv xmlns=\u0022http:\/\/www.w3.org\/1999\/xhtml\u0022 data-highwire-cite-ref-tooltip-instance=\u0022highwire_reflinks_tooltip\u0022 class=\u0022content-block-markup\u0022 xmlns:xhtml=\u0022http:\/\/www.w3.org\/1999\/xhtml\u0022\u003E\u003Cdiv class=\u0022article abstract-view \u0022\u003E\u003Cspan class=\u0022highwire-journal-article-marker-start\u0022\u003E\u003C\/span\u003E\u003Cdiv class=\u0022section abstract\u0022 id=\u0022abstract-1\u0022\u003E\u003Ch2 class=\u0022\u0022\u003EAbstract\u003C\/h2\u003E\u003Cdiv id=\u0022sec-1\u0022 class=\u0022subsection\u0022\u003E\u003Cp id=\u0022p-2\u0022\u003E\u003Cstrong\u003EBackground\u003C\/strong\u003E Large language models (LLMs) have demonstrated rapid advancements in natural language understanding and generation, prompting their integration into biomedical research, clinical practice, and professional education. However, systematic evaluation of LLMs in specialty-specific domains such as dentistry and periodontology remain limited, particularly regarding multidimensional performance metrics.\u003C\/p\u003E\u003C\/div\u003E\u003Cdiv id=\u0022sec-2\u0022 class=\u0022subsection\u0022\u003E\u003Cp id=\u0022p-3\u0022\u003E\u003Cstrong\u003EObjective\u003C\/strong\u003E To conduct a comprehensive, multidimensional assessment of commercially available LLMs: GPT-4.0, GPT-5.0, and Claude SONNET 4.0 on the American Academy of Periodontology in-service examination, focusing on response accuracy, self-assessed confidence calibration, citation validity, and hallucination prevalence.\u003C\/p\u003E\u003C\/div\u003E\u003Cdiv id=\u0022sec-3\u0022 class=\u0022subsection\u0022\u003E\u003Cp id=\u0022p-4\u0022\u003E\u003Cstrong\u003EMethods\u003C\/strong\u003E Models were evaluated on the 2024 AAP In-Service Examination (331 questions) using two formats: Full Test (all questions at once) and Individual Question (one at a time). Prompts were standardized; models selected answers, and for GPT-5.0 and Claude SONNET 4.0, also provided confidence ratings and citations. Citation validity was assessed using a human-in-the-loop protocol with expert review. Statistical analyses included chi-square, McNemar\u2019s, and logistic regression to assess accuracy, question fatigue, confidence calibration, and citation reliability.\u003C\/p\u003E\u003C\/div\u003E\u003Cdiv id=\u0022sec-4\u0022 class=\u0022subsection\u0022\u003E\u003Cp id=\u0022p-5\u0022\u003E\u003Cstrong\u003EResults\u003C\/strong\u003E LLMs achieved high overall accuracy (78\u201387%), with the Individual Question format consistently yielding higher scores than Full Test, though differences were not statistically significant.\u003C\/p\u003E\u003Cp id=\u0022p-6\u0022\u003EAccuracy was highest in fact-dense domains (biochemistry, physiology, microbiology) and lowest in integrative domains (diagnosis, therapy). Significant question fatigue was observed in GPT-5.0 Full Test mode (OR = 0.997, p = 0.035), but not in Individual Question mode.\u003C\/p\u003E\u003Cp id=\u0022p-7\u0022\u003EConfidence scores predicted accuracy, with the strongest calibration in Individual Question mode. Citation analysis revealed frequent hallucinations, mostly critically erroneous, and citation validity was independent of answer accuracy.\u003C\/p\u003E\u003C\/div\u003E\u003Cdiv id=\u0022sec-5\u0022 class=\u0022subsection\u0022\u003E\u003Cp id=\u0022p-8\u0022\u003E\u003Cstrong\u003EConclusions\u003C\/strong\u003E LLMs can answer a broad spectrum of periodontal specialty questions, but their reliability varies with context and information presentation. While promising as adjunctive tools, their outputs\u2014 especially for complex reasoning and citations\u2014require rigorous human review in educational and research settings to ensure accuracy and safety.\u003C\/p\u003E\u003C\/div\u003E\u003Cdiv id=\u0022sec-6\u0022 class=\u0022subsection\u0022\u003E\u003Cp id=\u0022p-9\u0022\u003E\u003Cstrong\u003EAuthor Summary\u003C\/strong\u003E Artificial intelligence chatbots are rapidly entering medical education, yet we lack comprehensive understanding of their reliability when students depend on them for learning. We developed a multidimensional evaluation framework to systematically assess AI performance beyond simple accuracy, examining how these systems behave across different medical topics, question types, and presentation formats.\u003C\/p\u003E\u003Cp id=\u0022p-10\u0022\u003EUsing 331 real dental examination questions, we tested three major AI systems, analyzing not only correctness but also confidence calibration - whether AI confidence levels match actual accuracy - and implementing human-in-the-loop verification to check if cited sources actually exist.\u003C\/p\u003E\u003Cp id=\u0022p-11\u0022\u003EOur findings highlight critical vulnerabilities in current AI systems. Most alarmingly, these chatbots fabricated nearly half of their citations while maintaining unwavering confidence in both correct and incorrect responses. This combination of overconfidence and misinformation means students cannot distinguish reliable from unreliable AI responses. Additionally, we documented progressive performance decline during sequential questioning, similar to human cognitive fatigue.\u003C\/p\u003E\u003Cp id=\u0022p-12\u0022\u003EWhile we know AI systems generate rather than retrieve information, our research demonstrates the real-world consequences of this limitation. As artificial intelligence integrates into education, healthcare diagnostics, and insurance decisions, these findings underscore the urgent need for better evaluation frameworks and user education about AI limitations.\u003C\/p\u003E\u003C\/div\u003E\u003C\/div\u003E\u003Ch3\u003ECompeting Interest Statement\u003C\/h3\u003E\u003Cp id=\u0022p-13\u0022\u003EThe authors have declared no competing interest.\u003C\/p\u003E\u003Ch3\u003EFunding Statement\u003C\/h3\u003E\u003Cp id=\u0022p-14\u0022\u003EThe author(s) received no specific funding for this work.\u003C\/p\u003E\u003Ch3\u003EAuthor Declarations\u003C\/h3\u003E\u003Cp id=\u0022p-15\u0022\u003EI confirm all relevant ethical guidelines have been followed, and any necessary IRB and\/or ethics committee approvals have been obtained.\u003C\/p\u003E\u003Cp id=\u0022p-16\u0022\u003ENot Applicable\u003C\/p\u003E\u003Cp id=\u0022p-17\u0022\u003EThe details of the IRB\/oversight body that provided approval or exemption for the research described are given below:\u003C\/p\u003E\u003Cp id=\u0022p-18\u0022\u003EThis research does not involve human subjects. The study evaluated the performance of artificial intelligence language models (GPT-4o, GPT-4o mini, and Claude-3.5 Sonnet) on publicly available American Academy of Periodontology examination questions. No human participants were recruited, interviewed, surveyed, or involved in any aspect of data collection. All analyses were conducted on AI-generated text responses and publicly available examination materials. Therefore, Institutional Review Board (IRB) review was not required for this computational research study.\u003C\/p\u003E\u003Cp id=\u0022p-19\u0022\u003EI confirm that all necessary patient\/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient\/participant\/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.\u003C\/p\u003E\u003Cp id=\u0022p-20\u0022\u003ENot Applicable\u003C\/p\u003E\u003Cp id=\u0022p-21\u0022\u003EI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).\u003C\/p\u003E\u003Cp id=\u0022p-22\u0022\u003ENot Applicable\u003C\/p\u003E\u003Cp id=\u0022p-23\u0022\u003EI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.\u003C\/p\u003E\u003Cp id=\u0022p-24\u0022\u003ENot Applicable\u003C\/p\u003E\u003Cdiv class=\u0022section data-availability\u0022 id=\u0022sec-35\u0022\u003E\u003Ch2 class=\u0022\u0022\u003EData Availability\u003C\/h2\u003E\u003Cp id=\u0022p-90\u0022\u003EThe datasets generated and analyzed during this study are available from the corresponding author upon reasonable request. The complete dataset includes: AI model responses to all 331 American Academy of Periodontology examination questions, including accuracy assessments, confidence scores, and generated citations for GPT-4o, GPT-4o mini, and Claude-3.5 Sonnet models Human expert validation results for all AI-generated citations, including verification status and authenticity scores Statistical analysis outputs including confidence calibration metrics, accuracy measurements across question categories, and question fatigue analysis. Data requests should be directed to the corresponding author\u003C\/p\u003E\u003C\/div\u003E\u003Cspan class=\u0022highwire-journal-article-marker-end\u0022\u003E\u003C\/span\u003E\u003C\/div\u003E\u003Cspan class=\u0022related-urls\u0022\u003E\u003C\/span\u003E\u003C\/div\u003E\u003C\/div\u003E  \u003C\/div\u003E\n\n  \n  \u003C\/div\u003E\n\u003Cdiv class=\u0022panel-separator\u0022\u003E\u003C\/div\u003E\u003Cdiv class=\u0022panel-pane pane-biorxiv-copyright\u0022 \u003E\n  \n      \n  \n  \u003Cdiv class=\u0022pane-content\u0022\u003E\n    \u003Cdiv class=\u0022field field-name-field-highwire-copyright field-type-text field-label-inline clearfix\u0022\u003E\u003Cdiv class=\u0022field-label\u0022\u003ECopyright\u0026nbsp;\u003C\/div\u003E\u003Cdiv class=\u0022field-items\u0022\u003E\u003Cdiv class=\u0022field-item even\u0022\u003EThe copyright holder for this preprint is the author\/funder, who has granted medRxiv a license to display the preprint in perpetuity.\u003Cspan class=\u0022license-type\u0022\u003E It is made available under a \u003Ca href=\u0022http:\/\/creativecommons.org\/licenses\/by\/4.0\/\u0022 class=\u0022\u0022 data-icon-position=\u0022\u0022 data-hide-link-title=\u00220\u0022\u003ECC-BY 4.0 International license\u003C\/a\u003E.\u003C\/span\u003E\u003C\/div\u003E\u003C\/div\u003E\u003C\/div\u003E  \u003C\/div\u003E\n\n  \n  \u003C\/div\u003E\n\u003C\/div\u003E\n  \u003C\/div\u003E\n\u003C\/div\u003E\n\u003C\/div\u003E\u003Cscript type=\u0022text\/javascript\u0022 src=\u0022https:\/\/www.medrxiv.org\/sites\/default\/files\/js\/js_SXHPyYQMndPSjH0oAPTy1xd0XLtmYCIziRIiNb0RJd8.js\u0022\u003E\u003C\/script\u003E\n\u003C\/body\u003E\u003C\/html\u003E"}