NEWManaged multimodal retrieval.Explore platform →
    Models/Text Extraction/ibm-granite/granite-vision-4.1-4b
    HFOCRApache 2.0

    granite-vision-4.1-4b

    by ibm-granite

    Specialized VLM for extracting structured data from charts, tables, and forms

    39Kdl/month
    4Bparams
    Identifiers
    Model ID
    ibm-granite/granite-vision-4.1-4b
    Feature URI
    mixpeek://image_extractor@v1/ibm_granite_vision_41_4b_v1

    Overview

    Granite Vision 4.1 is IBM's purpose-built document extraction model that converts visual content — charts, tables, forms, key-value pairs — into structured machine-readable formats (CSV, JSON, HTML). Unlike general-purpose VLMs that describe what they see, Granite Vision extracts precise data values with high accuracy, making it suitable for automated document processing pipelines.

    On Mixpeek, Granite Vision powers structured extraction from document pages: converting chart images to CSV data, table images to JSON records, and form images to key-value pairs. This structured output is directly indexable and filterable, unlike free-text captions.

    Architecture

    LoRA adapter on Granite-4.1-3B vision-language model. 4B total parameters (3.4B LLM + 0.6B vision encoder/projectors). Trained specifically on document extraction tasks: chart-to-CSV, table-to-JSON/HTML, key-value pair extraction. Integrates with IBM Docling for production pipelines.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "financial-docs",
    source: { url: "https://example.com/annual-report.pdf" },
    feature_extractors: [{
    feature: "document_extraction",
    model: "ibm-granite/granite-vision-4.1-4b"
    }]
    });

    Capabilities

    • Chart to CSV extraction with high precision
    • Table to JSON/HTML structured output
    • Key-value pair extraction (94.2% exact-match on VAREX)
    • Apache 2.0 license for unrestricted commercial use
    • LoRA adapter — lightweight deployment on top of Granite-4.1-3B

    Use Cases on Mixpeek

    Financial document processing: extract data from charts and tables in reports
    Invoice automation: extract line items, totals, and metadata into structured records
    Research data extraction: convert published figures and tables into analyzable data
    Form processing: extract key-value pairs from government and enterprise forms

    Benchmarks

    DatasetMetricScoreSource
    VAREX (key-value extraction)Exact-match accuracy (zero-shot)94.2%IBM Research, 2026 — Model Card

    Performance

    Input SizeDocument page image
    GPU Latency~40ms / page (A100)
    GPU Throughput~25 pages/sec (A100)
    GPU Memory~8 GB

    Specification

    FrameworkHF
    Organizationibm-granite
    FeatureOCR
    Outputtext + bbox
    Modalitiesvideo, image, document
    RetrieverText-in-Image
    Parameters4B
    LicenseApache 2.0
    Downloads/mo39K

    Research Paper

    Granite Vision 4.1 for Document Extraction

    arxiv.org

    Build a pipeline with granite-vision-4.1-4b

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio