Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools
Justin T Reese, Leonardo Chimirri, Yasemin Bridges, Daniel Danis, J Harry Caufield, Kyran Wissink Wissink, Julie A McMurry, Adam SL Graefe, Elena Casiraghi, Giorgio Valentini, Julius OB Jacobsen, Melissa A Haendel, Damian Smedley, Christopher J Mungall, Peter N Robinson
medRxiv 2024.07.22.24310816; doi: https://doi.org/10.1101/2024.07.22.24310816