olmOCR-7B-0225-preview - AI Vision Models Tool

Overview

olmOCR-7B-0225-preview is a preview release from AllenAI focused on document OCR and recognition. The model is a multimodal, image-to-text checkpoint fine-tuned from Qwen/Qwen2-VL-7B-Instruct using the olmOCR-mix-0225 dataset; it is designed to extract readable text and basic metadata from PDF images and scanned document pages. According to the model card on Hugging Face, the checkpoint is intended to be used together with the olmOCR toolkit for efficient, large-scale document processing workflows (e.g., batch PDF ingestion, page-level text extraction, and downstream indexing). This release is targeted at teams building research or prototype pipelines that need a compact, 7B-class multimodal model tailored toward document imagery rather than a generic OCR engine. As a preview, it provides a balance between model capability and resource footprint: the model operates as an image-to-text pipeline and is distributed under an Apache-2.0 license. Community engagement on the Hugging Face page (downloads and likes) indicates interest from practitioners evaluating multimodal OCR integrations. For production-grade, large-volume OCR deployments, AllenAI expects users to pair this model with the olmOCR toolkit and complementary layout/NER/post-processing tools to reach full pipeline robustness.

Model Statistics

  • Downloads: 3,185
  • Likes: 706
  • Pipeline: image-to-text
  • Parameters: 8.3B

License: apache-2.0

Model Details

Architecture and lineage: olmOCR-7B-0225-preview is an 8.3 billion-parameter multimodal image-to-text model fine-tuned from Qwen/Qwen2-VL-7B-Instruct. The checkpoint was adapted with the olmOCR-mix-0225 training mixture to specialize on document images (PDF pages and scanned documents). The model exposes an image-to-text pipeline interface and is provided under an Apache-2.0 license. Capabilities: Extracts textual content from document images and supports retrieval of basic page-level metadata when used together with the olmOCR toolkit. It is optimized for PDF-derived imagery and common document layouts seen in business and research documents. Because it is a preview release, the model emphasizes rapid iteration and integration rather than final production hardening. Intended integration: AllenAI recommends using this checkpoint within the olmOCR toolkit for batching, page segmentation, layout analysis, and post-processing (tokenization, normalization, and metadata extraction). As a fine-tuned Qwen2-VL checkpoint, it inherits multimodal reasoning and instruction-following traits from its base model, but exact performance will depend on downstream processing and pipeline design.

Key Features

  • Fine-tuned from Qwen/Qwen2-VL-7B-Instruct for document imagery
  • Specialized on the olmOCR-mix-0225 dataset for PDF and scanned pages
  • Image-to-text pipeline optimized for extracting page-level text
  • Designed to integrate with the olmOCR toolkit for large-scale workflows
  • Distributed under an Apache-2.0 license (preview release)

Example Usage

Example (python):

from transformers import pipeline
from PIL import Image

# Load image-to-text pipeline with the AllenAI olmOCR preview model
ocr = pipeline("image-to-text", model="allenai/olmOCR-7B-0225-preview")

# Open a PDF page exported as PNG/JPEG or a scanned image
img = Image.open("sample_page.png")

# Run the model to extract text
result = ocr(img)

# The pipeline typically returns a list of generated text objects
print("Raw model output:")
for item in result:
    print(item.get("generated_text", item))

# Note: For production use, combine this model with the olmOCR toolkit for
# batching, layout analysis, OCR post-processing, and metadata extraction.

Benchmarks

Parameters: 8.3B (Source: https://huggingface.co/allenai/olmOCR-7B-0225-preview)

Hugging Face downloads: 3,185 downloads (Source: https://huggingface.co/allenai/olmOCR-7B-0225-preview)

Hugging Face likes: 706 likes (Source: https://huggingface.co/allenai/olmOCR-7B-0225-preview)

Pipeline type: image-to-text (Source: https://huggingface.co/allenai/olmOCR-7B-0225-preview)

Last Refreshed: 2026-01-09

Key Information

  • Category: Vision Models
  • Type: AI Vision Models Tool