Granite-3.1-2B-Instruct - AI Language Models Tool

Overview

Granite-3.1-2B-Instruct is a lightweight, open-source 2 billion-parameter instruction-following language model released by the Granite Team and IBM. It is a dense, decoder-only transformer finetuned from the Granite-3.1-2B-Base checkpoint using a mix of permissively licensed public instruction datasets plus internally generated synthetic data targeted at long-context problems. The model is optimized for practical developer uses—summarization, retrieval-augmented generation (RAG), function-calling, long-document QA and meeting summarization—while remaining small enough to run efficiently and to be quantized for CPU/edge inference. According to the project documentation, Granite 3.1 models extend context length up to 128K tokens and are released under an Apache 2.0 license (developer and commercial use permitted). ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct?utm_source=openai)) Community feedback has been mixed: many users praise the 2B dense variant for speed, long-context handling, and efficient quantized deployments, while some community posts report uneven performance on coding tasks and occasional generation artifacts in complex scenarios. The Granite family (dense and MoE variants) was trained with large-scale token budgets (dense models cited as trained on ~12T tokens) and aims to provide a balance between efficiency and capability for enterprise customization and RAG-centric workflows. For quick experimentation the model is available on platforms such as Replicate, Hugging Face, and community quantized builds for local inference. ([huggingface.co](https://huggingface.co/lmstudio-community/granite-3.1-2b-instruct-GGUF?utm_source=openai))

Key Features

  • 128K token context window for long-document and meeting summarization.
  • Instruction-tuned for summarization, QA, extraction, and RAG workflows.
  • Function-calling and structured chat-format support for tool/agent integration.
  • Multilingual support across 12 languages (EN, DE, ES, FR, JA, PT, AR, CS, IT, KO, NL, ZH).
  • Small 2B dense footprint designed for efficient quantization and CPU/edge deployments.

Example Usage

Example (python):

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "ibm-granite/granite-3.1-2b-instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
# device_map='auto' uses available GPUs/TPUs; remove for CPU-only environments
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
model.eval()

chat = [
    {"role": "user", "content": "Summarize the following meeting transcript in three bullet points: <paste transcript here>"}
]
# apply the model's chat template if provided by the tokenizer
chat_prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(chat_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

# Example usage adapted from the Replicate / Hugging Face model pages. ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct?utm_source=openai))

Benchmarks

OpenLLM average score (2B, reported unquantized): 61.98 (Source: ([huggingface.co](https://huggingface.co/RedHatAI/granite-3.1-2b-instruct-quantized.w4a16?utm_source=openai)))

Context length (maximum): 128K tokens (Source: ([ibm.com](https://www.ibm.com/new/announcements/ibm-granite-3-1-powerful-performance-long-context-and-more?utm_source=openai)))

Release date: December 18, 2024 (Source: ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct?utm_source=openai)))

Training tokens (dense family): ≈12 trillion tokens (dense models) (Source: ([github.com](https://github.com/ibm-granite/granite-3.1-language-models?utm_source=openai)))

License: Apache-2.0 (Source: ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct?utm_source=openai)))

Last Refreshed: 2026-01-09

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool