NEWAgents can now see video via MCP.Try it now →
    Models/Image To Text/Xenova/vit-gpt2-image-captioning
    Image To Texttransformers.js

    vit-gpt2-image-captioning

    by Xenova

    Identifier
    Model ID
    Xenova/vit-gpt2-image-captioning

    Tags

    transformers.jsonnxvision-encoder-decoderimage-text-to-textimage-captioningimage-to-textbase_model:nlpconnect/vit-gpt2-image-captioningbase_model:quantized:nlpconnect/vit-gpt2-image-captioningregion:us

    Use vit-gpt2-image-captioning on Mixpeek

    Build multimodal processing pipelines with this model and others. Extract features, run inference, and set up retrieval, all through the Mixpeek pipeline builder.

    Open Pipeline Builder