BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Google Releases MedGemma: Open AI Models for Medical Text and Image Analysis

Google Releases MedGemma: Open AI Models for Medical Text and Image Analysis

Listen to this article -  0:00

Google has released MedGemma, a pair of open-source generative AI models designed to support medical text and image understanding in healthcare applications. Based on the Gemma 3 architecture, the models are available in two configurations: MedGemma 4B, a multimodal model capable of processing both images and text, and MedGemma 27B, a larger model focused solely on medical text.

According to Google, the models are designed to assist in tasks such as radiology report generation, clinical summarization, patient triage, and general medical question answering. MedGemma 4B, in particular, has been pre-trained using a wide range of de-identified medical images, including chest X-rays, dermatology photos, histopathology slides, and ophthalmologic images. Both models are available under open licenses for research and development use, and come in pre-trained and instruction-tuned variants.

Despite these capabilities, Google emphasizes that MedGemma is not intended for direct clinical use without further validation and adaptation. The models are intended to serve as a foundation for developers, who can adapt and fine-tune them for specific medical use cases.

Some early testers have shared observations on the models' strengths and limitations. Vikas Gaur, a clinician and AI practitioner, tested the MedGemma 4B-it model using a chest X-ray from a patient with confirmed tuberculosis. He reported that the model generated a normal interpretation, missing clinically evident signs of the disease:

Despite clear TB findings in the actual case, MedGemma reported: ‘Normal chest X-ray. Heart size is within normal limits. Lungs well-expanded and clear.'

Gaur suggested that additional training on high-quality annotated data might help align model outputs with clinical expectations.

Furthermore, Mohammad Zakaria Rajabi, a biomedical engineer, noted interest in expanding the capabilities of the larger 27B model to include image processing:

We are eagerly looking forward to seeing MedGemma 27B support image analysis as well.

Technical documentation indicates that the models were evaluated on over 22 datasets spanning multiple medical tasks and imaging modalities. Public datasets used in training include MIMIC-CXR, Slake-VQA, PAD-UFES-20, and others. Several proprietary and internal datasets were also used under license or participant consent.

The models can be adapted through techniques like prompt engineering, fine-tuning, and integration with agentic systems using other tools from the Gemini ecosystem. However, performance can vary depending on prompt structure, and the models have not been evaluated for multi-turn conversations or multi-image inputs.

MedGemma provides an accessible foundation for research and development in medical AI, but its practical effectiveness will depend on how well it is validated, fine-tuned, and integrated into specific clinical or operational contexts.

About the Author

BT