Granite-3.1-2B-Instruct - AI Language Models Tool
Overview
Granite-3.1-2B-Instruct is an open-source, instruction-tuned language model from the Granite Team (IBM) that targets long-context, multilingual, and code-related tasks. Although marketed as a “2B” model, the published checkpoint metadata lists roughly 2.5 billion parameters and a 128K token context window, making it well suited for long-document summarization, multi-document QA, RAG pipelines, code generation/debugging, and structured function-calling. The model was released December 18, 2024 and is distributed under an Apache 2.0 license. ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct)) The model is a decoder-only dense transformer using GQA attention, RoPE positional embeddings, SwiGLU MLPs, and RMSNorm. It was finetuned from Granite-3.1-2B-Base with a mix of permissively-licensed public datasets, synthetic datasets created to improve long-context behavior, and small amounts of human-curated data. IBM’s Granite family continued to evolve after 3.1 (e.g., Granite 3.2 added toggled reasoning and vision variants), so users who need experimental chain-of-thought reasoning or multimodal document vision may evaluate later Granite releases; Granite-3.1 remains a lightweight, deployable instruct model for many enterprise RAG and assistant workflows. ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct))
Key Features
- 128K token context window for multi-document and long-form summarization.
- Instruction-tuned for dialog, summarization, translation, and code tasks.
- Function-calling and RAG-friendly formatting for structured outputs.
- Multilingual support for 12 languages (English, German, Spanish, etc.).
- Lightweight deployment profile (named 2B; checkpoint metadata lists ~2.5B).
Example Usage
Example (python):
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Example adapted from the Replicate model README
model_path = "ibm-granite/granite-3.1-2b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# For GPU set device_map='auto' or remove device_map for CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto')
model.eval()
chat = [
{"role": "user", "content": "Summarize the following meeting notes in three bullet points: ..."},
]
# apply chat template if tokenizer exposes it (as in the model README)
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.batch_decode(outputs))
# Note: follow the model README for exact environment and accelerator setup on your host. Pricing
Weights are released under an Apache 2.0 license (free to download). Commercial inference pricing depends on platform: IBM watsonx lists Granite 3 family pay-as-you-go rates (granite-3-2b-instruct shown ~USD 0.10 per 1M tokens on IBM’s table), while Replicate documents training GPU costs (H100 at $0.001525/sec for training jobs). Actual hosting or API charges vary by provider and region; consult the platform (IBM watsonx, Replicate, Hugging Face, etc.) for current run-time pricing. ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct))
Benchmarks
Parameters (published checkpoint): ≈2.5B parameters (Source: ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct)))
Context window: 128K tokens (Source: ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct)))
Training tokens (aggregate for 3.1 family): 12T tokens (dense models) (Source: ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct)))
Release date: December 18, 2024 (Source: ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct)))
License: Apache 2.0 (per model page) (Source: ([replicate.com](https://replicate.com/ibm-granite/granite-3.1-2b-instruct)))
Key Information
- Category: Language Models
- Type: AI Language Models Tool