VLM-R1

VLM-R1 is a stable and generalizable R1-style large Vision-Language Model designed for visual understanding tasks such as Referring Expression Comprehension (REC) and Out-of-Domain evaluation. The repository provides training scripts, multi-node and multi-image input support, and demonstrates state-of-the-art performance with RL-based fine-tuning approaches.

Key Information

  • Category: Vision Models
  • Source: Github
  • Last updated: January 09, 2026

Structured Metrics

No structured metrics captured yet.

Links

Canonical source: https://github.com/om-ai-lab/VLM-R1