Home › Vision Models › olmOCR-7B-0225-preview

olmOCR-7B-0225-preview - AI Vision Models Tool

Overview

olmOCR-7B-0225-preview is a preview release from AllenAI focused on document OCR and recognition. The model is a multimodal, image-to-text checkpoint fine-tuned from Qwen/Qwen2-VL-7B-Instruct using the olmOCR-mix-0225 dataset; it is designed to extract readable text and basic metadata from PDF images and scanned document pages. According to the model card on Hugging Face, the checkpoint is intended to be used together with the olmOCR toolkit for efficient, large-scale document processing workflows (e.g., batch PDF ingestion, page-level text extraction, and downstream indexing). This release is targeted at teams building research or prototype pipelines that need a compact, 7B-class multimodal model tailored toward document imagery rather than a generic OCR engine. As a preview, it provides a balance between model capability and resource footprint: the model operates as an image-to-text pipeline and is distributed under an Apache-2.0 license. Community engagement on the Hugging Face page (downloads and likes) indicates interest from practitioners evaluating multimodal OCR integrations. For production-grade, large-volume OCR deployments, AllenAI expects users to pair this model with the olmOCR toolkit and complementary layout/NER/post-processing tools to reach full pipeline robustness.

Model Statistics

Downloads: 3,185
Likes: 706
Pipeline: image-to-text
Parameters: 8.3B

License: apache-2.0

Model Details

Architecture and lineage: olmOCR-7B-0225-preview is an 8.3 billion-parameter multimodal image-to-text model fine-tuned from Qwen/Qwen2-VL-7B-Instruct. The checkpoint was adapted with the olmOCR-mix-0225 training mixture to specialize on document images (PDF pages and scanned documents). The model exposes an image-to-text pipeline interface and is provided under an Apache-2.0 license. Capabilities: Extracts textual content from document images and supports retrieval of basic page-level metadata when used together with the olmOCR toolkit. It is optimized for PDF-derived imagery and common document layouts seen in business and research documents. Because it is a preview release, the model emphasizes rapid iteration and integration rather than final production hardening. Intended integration: AllenAI recommends using this checkpoint within the olmOCR toolkit for batching, page segmentation, layout analysis, and post-processing (tokenization, normalization, and metadata extraction). As a fine-tuned Qwen2-VL checkpoint, it inherits multimodal reasoning and instruction-following traits from its base model, but exact performance will depend on downstream processing and pipeline design.

Key Features

Fine-tuned from Qwen/Qwen2-VL-7B-Instruct for document imagery
Specialized on the olmOCR-mix-0225 dataset for PDF and scanned pages
Image-to-text pipeline optimized for extracting page-level text
Designed to integrate with the olmOCR toolkit for large-scale workflows
Distributed under an Apache-2.0 license (preview release)

Example Usage

Example (python):

from transformers import pipeline
from PIL import Image

# Load image-to-text pipeline with the AllenAI olmOCR preview model
ocr = pipeline("image-to-text", model="allenai/olmOCR-7B-0225-preview")

# Open a PDF page exported as PNG/JPEG or a scanned image
img = Image.open("sample_page.png")

# Run the model to extract text
result = ocr(img)

# The pipeline typically returns a list of generated text objects
print("Raw model output:")
for item in result:
    print(item.get("generated_text", item))

# Note: For production use, combine this model with the olmOCR toolkit for
# batching, layout analysis, OCR post-processing, and metadata extraction.