Holo-3.1-4B

by Hcompany

4B vision-language model for GUI agents and computer-use perception

1.3Kdl/month

55likes

4Bparams

HuggingFace Use in Pipeline

Identifiers

Model ID

Hcompany/Holo-3.1-4B

Feature URI

mixpeek://image_extractor@v1/hcompany_holo_31_4b_v1

Overview

Holo-3.1-4B is a compact vision-language model tagged for action, agent, computer use, and GUI agents. It is relevant to multimodal search because many agent traces are not documents. They are screenshots, browser states, UI elements, and before-after visual states from tool calls.

On Mixpeek, Holo can turn screenshots and UI recordings into searchable agent memory. That lets an agent retrieve prior visual states, inspect similar failures, and compare what the screen looked like before deciding whether to retry, stop, or ask for help.

Architecture

Qwen-family image-text-to-text model with Hugging Face metadata for action, agent, computer use, GUI agents, and conversational visual reasoning.

Mixpeek SDK Integration

from mixpeek import Mixpeek

mixpeek = Mixpeek(api_key="YOUR_API_KEY")

mixpeek.ingest.images(
    collection="computer_use_traces",
    source={"type": "s3", "bucket": "agent-screens"},
    pipeline={
        "captioning": {
            "model": "mixpeek://image_extractor@v1/hcompany_holo_31_4b_v1"
        }
    }
)