NEWVectors or files. Pick a path.Start →
    Models/Captioning/Hcompany/Holo-3.1-4B
    HFScene CaptioningApache 2.0

    Holo-3.1-4B

    by Hcompany

    4B vision-language model for GUI agents and computer-use perception

    1.3Kdl/month
    55likes
    4Bparams
    Identifiers
    Model ID
    Hcompany/Holo-3.1-4B
    Feature URI
    mixpeek://image_extractor@v1/hcompany_holo_31_4b_v1

    Overview

    Holo-3.1-4B is a compact vision-language model tagged for action, agent, computer use, and GUI agents. It is relevant to multimodal search because many agent traces are not documents. They are screenshots, browser states, UI elements, and before-after visual states from tool calls.

    On Mixpeek, Holo can turn screenshots and UI recordings into searchable agent memory. That lets an agent retrieve prior visual states, inspect similar failures, and compare what the screen looked like before deciding whether to retry, stop, or ask for help.

    Architecture

    Qwen-family image-text-to-text model with Hugging Face metadata for action, agent, computer use, GUI agents, and conversational visual reasoning.

    Mixpeek SDK Integration

    from mixpeek import Mixpeek
    mixpeek = Mixpeek(api_key="YOUR_API_KEY")
    mixpeek.ingest.images(
    collection="computer_use_traces",
    source={"type": "s3", "bucket": "agent-screens"},
    pipeline={
    "captioning": {
    "model": "mixpeek://image_extractor@v1/hcompany_holo_31_4b_v1"
    }
    }
    )

    Capabilities

    • GUI and computer-use visual reasoning
    • Screenshot state description for agent memory
    • Compact 4B model size for high-volume UI traces
    • Apache 2.0 licensed model card metadata on Hugging Face

    Use Cases on Mixpeek

    Index browser-agent screenshots and action traces
    Search UI failures by visible state instead of log text
    Compare before-after screens in QA automation
    Give support agents visual memory over prior workflows

    Performance

    Input SizeVariable
    GPU LatencyInput dependent
    GPU ThroughputBatch dependent
    GPU Memory~10 GB

    Best used with screenshot downsampling and UI event metadata filters.

    Specification

    FrameworkHF
    OrganizationHcompany
    FeatureScene Captioning
    Outputtext
    Modalitiesvideo, image
    RetrieverSemantic Search
    Parameters4B
    LicenseApache 2.0
    Downloads/mo1.3K
    Likes55

    Build a pipeline with Holo-3.1-4B

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio