Prompt injection attacks on vision-language models for surgical decision support

Zheyuan Zhang; Muhammad Ibtsaam Qadir; Matthias Carstens; Evan Hongyang Zhang; Madison Sarah Loiselle; Farren Marc Martinus; Maksymilian Ksawier Mroczkowski; Jan Clusmann; Jakob Nikolas Kather; Fiona R. Kolbinger

doi:10.1101/2025.07.16.25331645

Abstract

Importance Artificial Intelligence-driven analysis of laparoscopic video holds potential to increase the safety and precision of minimally invasive surgery. Vision-language models are particularly promising for video-based surgical decision support due to their capabilities to comprehend complex temporospatial (video) data. However, the same multimodal interfaces that enable such capabilities also introduce new vulnerabilities to manipulations through embedded deceptive text or images (prompt injection attacks).

Objective To systematically evaluate how susceptible state-of-the-art video-capable vision-language models are to textual and visual prompt injection attacks in the context of clinically relevant surgical decision support tasks.

Design, Setting, and Participants In this observational study, we systematically evaluated four state-of-the-art vision-language models, Gemini 1.5 Pro, Gemini 2.5 Pro, GPT-o4-mini-high, and Qwen 2.5-VL, across eleven surgical decision support tasks: detection of bleeding events, foreign objects, image distortions, critical view of safety assessment, and surgical skill assessment. Prompt injection scenarios involved misleading textual prompts and visual perturbations, displayed as white text overlay, applied at varying durations.

Main Outcomes and Measures The primary measure was model accuracy, contrasted between baseline performance and each prompt injection condition.

Results All vision-language models demonstrated good baseline accuracy, with Gemini 2.5 Pro generally achieving the highest mean [standard deviation] accuracy across all tasks (0.82 [0.01]), compared to Gemini 1.5 Pro (0.70 [0.03]) and GPT-o4 mini-high (0.67 [0.06]). Across tasks, Qwen 2.5-VL censored most outputs and achieved an accuracy of (0.58 [0.03]) on non-censored outputs. Textual and temporally-varying visual prompt injections reduced the accuracy for all models. Prolonged visual prompt injections were generally more harmful than single-frame injections. Gemini 2.5 Pro showed the greatest robustness and maintained stable performance for several tasks despite prompt injections, whereas GPT-o4-mini-high exhibited the highest vulnerability, with mean (standard deviation) accuracy across all tasks declining from 0.67 (0.06) at baseline to 0.24 (0.04) under full-duration visual prompt injection (P < .001).

Conclusion and Relevance These findings indicate the critical need for robust temporal reasoning capabilities and specialized guardrails before vision-language models can be safely deployed for real-time surgical decision support.

Question Are video vision-language models (VLMs) susceptible to textual and visual prompt injection attacks when used for surgical decision support tasks?

Finding Textual and visual prompt injection attacks consistently degraded the performance of four state-of-the-art VLMs across eleven surgical tasks. Gemini 2.5 Pro was most robust to textual and visual prompt injection attacks, whereas GPT-o4-mini-high was most vulnerable. Prolonged visual injections had a greater negative impact than single-frame injections.

Meaning Present-generation video VLMs are highly vulnerable to textual and visual prompt injection attacks. This critical safety vulnerability must be addressed before their integration into surgical decision support systems.

Competing Interest Statement

JNK declares consulting services for Bioptimus; Panakeia; AstraZeneca; and MultiplexDx. Furthermore, he holds shares in StratifAI, Synagen, Tremont AI and Ignition Labs; has received an institutional research grant by GSK; and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer, and Fresenius. FRK declares advisory roles for Radical Healthcare, USA; and the Surgical Data Science Collective, USA. All other authors declare no competing interests.

Funding Statement

JC is supported by the Mildred-Scheel-Postdoktorandenprogramm of the German Cancer Aid (grant #70115730). JNK is supported by the German Cancer Aid DKH (DECADE, 70115166), the German Federal Ministry of Research, Technology and Space BMFTR (PEARL, 01KD2104C; CAMINO, 01EO2101; TRANSFORM LIVER, 031L0312A; TANGERINE, 01KT2302 through ERA-NET Transcan; Come2Data, 16DKZ2044A; DEEP-HCC, 031L0315A; DECIPHER-M, 01KD2420A; NextBIG, 01ZU2402A), the German Research Foundation DFG (CRC/TR 412, 535081457), the German Academic Exchange Service DAAD (SECAI, 57616814), the German Federal Joint Committee G-BA (TransplantKI, 01VSF21048), the European Union EU Horizon Europe research and innovation programme (ODELIA, 101057091; GENIAL, 101096312), the European Research Council ERC (NADIR, 101114631), the National Institutes of Health NIH (EPICO, R01 CA263318) and the National Institute for Health and Care Research NIHR (Leeds Biomedical Research Centre, NIHR203331). FRK receives support from the German Cancer Research Center (CoBot 2.0), the Joachim Herz Foundation (Add-On Fellowship for Interdisciplinary Life Science), the Central Indiana Corporate Partnership AnalytiXIN Initiative, the Evan and Sue Ann Werling Pancreatic Cancer Research Fund, and the Indiana Clinical and Translational Sciences Institute (EPAR4157) funded, in part, by Grant Number UM1TR004402 from the National Institutes of Health, National Center for Advancing Translational Sciences, Clinical and Translational Sciences Award. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, the NHS, the NIHR, or the Department of Health and Social Care. This work was funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. No identifiable patient data were used in this study; all clinical data used are publicly available. Therefore, no informed consent was required. The local Institutional Review Board at Purdue University reviewed and approved the overall analysis on February 7, 2024 (IRB-2023-1736). All prompt injection experiments were performed in controlled, simulated environments to prevent any risk of unintended harm. The disclosed attack strategies and prompts are intended solely for research purposes. The models evaluated in this study are research tools and are not approved for clinical use.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced in the present work are contained in the manuscript.

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.