VLM-R1
VLM-R1 is a stable and generalizable R1-style large Vision-Language Model designed for visual understanding tasks such as Referring Expression Comprehension (REC) and Out-of-Domain evaluation. The repository provides training scripts, multi-node and multi-image input support, and demonstrates state-of-the-art performance with RL-based fine-tuning approaches.
Key Information
- Category: Vision Models
- Source: Github
- Last updated: January 09, 2026
Structured Metrics
No structured metrics captured yet.
Links
Canonical source: https://github.com/om-ai-lab/VLM-R1