NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 79457968 of 9,588 models

    Text To Audio

    tharushaudana/mms-tts-sinhala-custom-vocab-v2

    98
    transformers
    Tabular Regression

    FreekyMeeky/autotrain-tm-pricepredictor-98386147082

    98
    transformers
    Object Detection

    DnaRnaProteins/qwen2.5-vl-3b-cells-det

    98
    Object Detection

    kilanisainikhil/AerialEye

    98
    1
    ultralytics
    Object Detection

    Charles95/detr-resnet-50-notimm

    98
    transformers
    Object Detection

    Francis51/detr-finetuned-VOC-v3

    98
    transformers
    Object Detection

    Francis51/detr-finetuned-VOC-v4

    98
    transformers
    Image Feature Extraction

    apple/aimv2-large-patch14-336-distilled

    98
    7
    transformers
    Image Feature Extraction

    timm/fastvit_mci2.apple_mclip2_dfndr2b

    98
    1
    timm
    Text To Video

    AaronHuangWei/Wan2.1-T2V-14B-NVFP4FakeQuant

    98
    diffusers
    Image Segmentation

    FriedParrot/fish-segmentation-simple

    98
    transformers
    Image Segmentation

    xuanwulab/HaS_Image_0209_FP32

    98
    1
    ultralytics
    Audio Classification

    FredDYyy/distilhubert-finetuned-gtzan

    98
    transformers
    Voice Activity Detection

    tensorlake/segmentation-3.0

    97
    1
    pyannote-audio
    Visual Question Answering

    Atul8827/vilt_finetuned_200

    97
    transformers
    Document Question Answering

    Mikhail1313/layoutlmv2-base-uncased_finetuned_docvqa

    97
    1
    transformers
    Object Detection

    Dilipan/detr-finetuned-invoice

    97
    2
    transformers
    Image Segmentation

    pirocheto/schp-pascal-7

    97
    2
    Image Segmentation

    apple/deeplabv3-mobilevit-x-small

    97
    4
    transformers
    Image Feature Extraction

    kittn/eupe_vitt16

    97
    transformers
    Image Feature Extraction

    hi-wesley/gemma3-vision-encoder

    97
    1
    transformers
    Video Classification

    matchacams/nystagmus_video_classification

    97
    transformers
    Text To Video

    OmniAvatar/OmniAvatar-14B

    97
    107
    Text To Video

    H-EmbodVis/HyDRA

    97
    pytorch
    332 / 400