NEWVectors or files. Pick a path.Start →
    Models/Captioning/sensenova/SenseNova-U1-8B-MoT
    HFScene CaptioningApache 2.0

    SenseNova-U1-8B-MoT

    by sensenova

    8B any-to-any multimodal model for image understanding, generation, and editing

    32.7Kdl/month
    281likes
    8Bparams
    Identifiers
    Model ID
    sensenova/SenseNova-U1-8B-MoT
    Feature URI
    mixpeek://image_extractor@v1/sensenova_u1_8b_mot_v1

    Overview

    SenseNova-U1-8B-MoT is an any-to-any multimodal model tagged for feature extraction, image-to-text, text-to-image, image editing, and custom-code inference. That mix matters for agents because perception is often not a single captioning call: an agent may need to inspect an image, generate an explanation, propose an edit, and preserve evidence of what changed.

    On Mixpeek, SenseNova U1 fits pipelines that retrieve visual evidence first, then ask a multimodal model to explain or transform that evidence. It is especially relevant for creative QA, ad review, product imagery, and human-in-the-loop visual analysis.

    Architecture

    8B-class mixture-of-transformers style any-to-any multimodal model. Supports image-to-text, text-to-image, image editing, and feature extraction paths according to the Hugging Face model metadata.

    Mixpeek SDK Integration

    from mixpeek import Mixpeek
    mixpeek = Mixpeek(api_key="YOUR_API_KEY")
    mixpeek.ingest.images(
    collection="creative_library",
    source={"type": "s3", "bucket": "creative-assets"},
    pipeline={
    "captioning": {
    "model": "mixpeek://image_extractor@v1/sensenova_u1_8b_mot_v1"
    }
    }
    )

    Capabilities

    • Any-to-any multimodal interaction across image and text tasks
    • Image-to-text reasoning for visual evidence review
    • Text-to-image and image-editing paths for iterative agent workflows
    • Apache 2.0 licensed model card metadata on Hugging Face

    Use Cases on Mixpeek

    Agent review of creative assets and visual ad variants
    Multimodal retrieval followed by image-grounded explanation
    Visual QA where an agent needs both inspection and transformation
    Product imagery triage and annotation

    Performance

    Input SizeVariable
    GPU LatencyInput dependent
    GPU ThroughputBatch dependent
    GPU Memory~18 GB

    Any-to-any models should be routed to the narrowest task path needed for the agent step.

    Specification

    FrameworkHF
    Organizationsensenova
    FeatureScene Captioning
    Outputtext
    Modalitiesvideo, image
    RetrieverSemantic Search
    Parameters8B
    LicenseApache 2.0
    Downloads/mo32.7K
    Likes281

    Research Paper

    SenseNova-U1

    arxiv.org

    Build a pipeline with SenseNova-U1-8B-MoT

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Studio