Gemma 3 27B Instruct (google/gemma-3-27b-it) - AI Language Models Tool

Overview

Gemma 3 27B Instruct (google/gemma-3-27b-it) is an open-weight, instruction-tuned multimodal model released by Google DeepMind and published on Hugging Face. The model is part of the Gemma 3 family and provides a 27.4 billion-parameter instruction-tuned variant that accepts both text and image inputs and produces text outputs. It is optimized for interactive and instruction-driven tasks such as question answering, summarization, stepwise reasoning, and visual understanding. Designed for long-context scenarios, the Gemma 3 family supports context windows up to 128K tokens, enabling analysis of very large documents, multi-page PDFs, or long chat histories. The 27B Instruct variant (27B-IT) is offered alongside 12B and 4B sizes to balance capability and cost. The model is available as a Hugging Face model (image-text-to-text pipeline) and is intended for use via Transformers (>= 4.50), with support for GPU and multi-GPU inference and chat-style templates for conversational flows.

Model Statistics

  • Downloads: 1,400,389
  • Likes: 1798
  • Pipeline: image-text-to-text
  • Parameters: 27.4B

License: gemma

Model Details

Gemma 3 27B Instruct is derived from the Gemma 3 pretrained family (base model: google/gemma-3-27b-pt) and is an instruction-tuned variant optimized for multimodal image+text inputs. The model contains approximately 27.4 billion parameters and is published under the Gemma license on Hugging Face. It is packaged for the Hugging Face "image-text-to-text" pipeline, which lets callers supply images and prompt text together and receive a text answer. Capabilities: the model is tuned for instruction following (chat, summarization, QA), multi-step reasoning, and image understanding tasks such as captioning, visual question answering, and document image interpretation. The family supports an extended 128K token context window, making it suitable for processing long documents, multi-page transcripts, or concatenated multimodal content. Deployment: Gemma 3 27B-IT is distributed as open weights on Hugging Face and supports inference with Transformers >=4.50. For production or large-batch use, GPU or multi-GPU setups (device_map/accelerate) are recommended due to the model's memory and compute requirements. The model page on Hugging Face provides the weights, basic usage examples, and the model card for further details.

Key Features

  • 27.4B-parameter instruction-tuned model for strong language and multimodal performance
  • Multimodal image-text-to-text pipeline: accepts images + prompts, returns text responses
  • Up to 128K token context window for long documents, books, and extended conversations
  • Instruction-tuned for QA, summarization, step-by-step reasoning, and visual question answering
  • Open weights on Hugging Face; supported via Transformers (>=4.50) with GPU/multi-GPU inference

Example Usage

Example (python):

from transformers import pipeline
from PIL import Image

# Requires transformers >= 4.50 and sufficient GPU memory for the 27B model
pipe = pipeline(
    task="image-text-to-text",
    model="google/gemma-3-27b-it",
    device_map="auto"  # or device=0 for a single GPU
)

# Load an image and provide an instruction prompt
image = Image.open("/path/to/photo.jpg")
prompt = (
    "You are an assistant. Describe the image briefly and answer: What is the main object, and what action is occurring?"
)

# The pipeline accepts image and text together
result = pipe(image, prompt)
print(result[0]["generated_text"])

Benchmarks

Hugging Face downloads: 1,400,389 (Source: https://huggingface.co/google/gemma-3-27b-it)

Hugging Face likes: 1,798 (Source: https://huggingface.co/google/gemma-3-27b-it)

Parameters: 27.4B (Source: https://huggingface.co/google/gemma-3-27b-it)

Context window: Up to 128K tokens (Source: https://huggingface.co/google/gemma-3-27b-it)

Pipeline type: image-text-to-text (Source: https://huggingface.co/google/gemma-3-27b-it)

Last Refreshed: 2026-01-12

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool