SmolLM - AI Language Models Tool

Overview

SmolLM is an open-source family of compact text and vision models from Hugging Face designed for efficient, on-device inference and research. The project includes small-language variants (SmolLM series: 135M, 360M, 1.7B) and a more recent SmolLM3 3B model that Hugging Face released with full training recipes, checkpoints, and evaluation results. According to the Hugging Face blog and the official repository, SmolLM3 was trained with an emphasis on high-quality data mixtures and efficiency techniques (Grouped-Query Attention, NoPE, YaRN) to deliver competitive performance at the 3B scale while enabling very long contexts and instruction/reasoning modes (think / no_think). (Sources: Hugging Face blog and GitHub repository: https://huggingface.co/blog/smollm3, https://github.com/huggingface/smollm). SmolLM is explicitly released under an open-source (Apache-2.0) license and ships both base and instruction-tuned checkpoints, a multimodal SmolVLM line for vision+language tasks, and engineering artifacts that make deployment easier (quantized checkpoints, GGUF/ONNX guidance, vLLM/llama.cpp examples). The project emphasizes reproducibility: Hugging Face published pretraining mixtures, ablations, and model-merging recipes used to produce the released instruct checkpoints. This combination of compact parameter counts, long-context support (up to 128k tokens), and published recipes makes SmolLM well-suited for edge agents, RAG pipelines, and research into efficient model training and inference. (Sources: Hugging Face blog, repo, and Transformers docs: https://huggingface.co/blog/smollm3, https://github.com/huggingface/smollm, https://huggingface.co/docs/transformers/model_doc/smollm3).

GitHub Statistics

  • Stars: 3,547
  • Forks: 254
  • Contributors: 20
  • License: Apache-2.0
  • Primary Language: Python
  • Last Updated: 2025-11-20T14:37:45Z

Key Features

  • Compact family: available models at 135M, 360M, 1.7B and a 3B SmolLM3 variant.
  • Long context support: designed to handle up to 128k tokens using YaRN and NoPE.
  • Efficiency tricks: Grouped-Query Attention (GQA) to reduce KV-cache and speed inference.
  • Instruction/reasoning modes: dual-mode instruct models with think / no_think behavior.
  • Open training recipe: full training configs, data mixtures, and checkpoints released.
  • Multimodal option: SmolVLM line supports image+text tasks for compact V&L deployments.

Example Usage

Example (python):

from transformers import AutoTokenizer, AutoModelForCausalLM

def generate(prompt, model_name="HuggingFaceTB/SmolLM3-3B", max_new_tokens=128):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)

    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

if __name__ == "__main__":
    print(generate("Write a short summary of the key features of SmolLM."))

Benchmarks

Parameters (SmolLM3): 3 billion parameters (3B) (Source: https://huggingface.co/blog/smollm3)

Pretraining tokens (SmolLM3): Trained on ~11 trillion tokens (reported training mixture) (Source: https://huggingface.co/blog/smollm3)

Max context length: Up to 128k tokens (via YaRN / NoPE configuration) (Source: https://huggingface.co/blog/smollm3)

HellaSwag (zero-shot, SmolLM3-3B): 76.15 (Source: https://huggingface.co/HuggingFaceTB/SmolLM3-3B)

MMLU-CF (average, SmolLM3-3B): 44.13 (reported average in HF evaluation tables) (Source: https://huggingface.co/HuggingFaceTB/SmolLM3-3B)

Last Refreshed: 2026-01-09

Key Information

  • Category: Language Models
  • Type: AI Language Models Tool