PT  - JOURNAL ARTICLE
AU  - Ventura D’addario, Andrew Maranhão
TI  - Breaking the Cost Barrier: How Quantization Enables Efficient Development and Deployment of LLMs for Public Healthcare
AID  - 10.1101/2025.11.17.25340460
DP  - 2025 Jan 01
TA  - medRxiv
PG  - 2025.11.17.25340460
4099  - http://medrxiv.org/content/early/2025/11/19/2025.11.17.25340460.short
4100  - http://medrxiv.org/content/early/2025/11/19/2025.11.17.25340460.full
AB  - The clinical promise of Large Language Models (LLMs) is often unrealized due to pro-hibitive computational costs. These costs create barriers not only to deployment in patient care but also to the vital process of fine-tuning models for specialized medical tasks and local patient populations. This study investigates 4-bit quantization as a methodology to make the entire clinical AI lifecycle—from development to implementation—both financially and practically viable. We performed a cost-benefit analysis using the Gemma 3 model family on the HealthQA-BR medical benchmark. We compared the diagnostic accuracy and computational resource requirements of standard full-precision models against their 4-bit quantized counterparts during both inference (clinical use) and QLoRA-based fine-tuning (model development). Quantization enabled massive efficiency gains with a clinically negligible impact on performance. For the 12B-parameter model, we observed a mere 1.3% absolute drop in accuracy. In exchange, computational requirements were reduced by 80% for fine-tuning and 69% for inference. This translates to a more than three-fold improvement in performance per unit of computational cost, accelerating research and development cycles. 4-bit quantization is a pivotal enabling technology for clinical AI. By drastically lowering the resource barrier for model customization and deployment, it empowers medical institutions to rapidly develop and validate specialized AI tools on-site. This approach holds particular promise for large-scale public health systems like Brazil’s SUS and provides a viable blueprint for similar health systems worldwide to transform AI from a theoretical possibility into a practical and equitable reality in patient care.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the Brazilian Ministry of Health (MoH/DECIT) in partnership with the National Council for Scientific and Technological Development (CNPq) [grant number 400757/2024-9] and the Gates Foundation. The Author Accepted Manuscript version arising from this submission will be published under a Creative Commons Attribution 4.0 Generic License.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced are available online at: https://huggingface.co/datasets/Larxel/healthqa-br https://huggingface.co/datasets/Larxel/healthqa-br