SmolVLM

SmolVLM is a 2B parameter vision-language model that is small, fast, and memory-efficient. It builds on the Idefics3 architecture with modifications such as an improved visual compression strategy and optimized patch processing, making it suitable for local deployment, including on laptops. All model checkpoints, training recipes, and tools are released open-source under the Apache 2.0 license.

Key Information

  • Category: Vision Models
  • Source: Huggingface
  • Last updated: January 09, 2026

Structured Metrics

No structured metrics captured yet.

Links

Canonical source: https://huggingface.co/blog/smolvlm