NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 75377560 of 9,588 models

    Text To Video

    nesaorg/animatediff-base

    137
    diffusers
    Audio Classification

    ALM/hubert-base-audioset

    137
    3
    transformers
    Depth Estimation

    aarondevstack/DepthPro-1024x1024-coreml

    137
    1
    coreml
    Unconditional Image Generation

    achsaf/ddpm-pixelart-16x16

    137
    diffusers
    Audio To Audio

    lucadellalib/dycast

    136
    4
    torch
    Object Detection

    keremberke/yolov5s-garbage

    136
    2
    yolov5
    Image Segmentation

    Adriatogi/segformer-b0-finetuned-segments-graffiti

    136
    transformers
    Audio Classification

    olaolugbenle/african-lid

    136
    transformers
    Video Classification

    sano90/videomae-large-finetuned-kinetics-finetuned-ucf101-subset

    135
    transformers
    Audio To Audio

    JorisCos/ConvTasNet_Libri2Mix_sepnoisy_8k

    135
    1
    asteroid
    Visual Question Answering

    mPLUG/mPLUG-Owl3-2B-241014

    135
    6
    Object Detection

    nsugianto/detr-resnet50_finetuned_lstabledetv1s9_lsdocelementdetv1type3_session7

    135
    transformers
    Object Detection

    EFFGRP/yolov11s-warehouse-pallets-1280

    135
    ultralytics
    Image Segmentation

    Xenova/clipseg-rd16

    135
    transformers.js
    Image Segmentation

    gobeldan/RMBG-2.0-GGUF

    135
    Audio To Audio

    popcornell/FasNetTAC-paper

    135
    3
    asteroid
    Object Detection

    mradermacher/Polaris-VGA-4B-Post1.0e-i1-GGUF

    134
    transformers
    Object Detection

    Pravallika6/detr-finetuned-logo-detection_v2

    134
    transformers
    Zero Shot Image Classification

    laion/CLIP-ViT-B-32-CommonPool.M.clip-s128M-b4K

    134
    open_clip
    Image Feature Extraction

    facebook/hiera-small-224-hf

    134
    transformers
    Image Feature Extraction

    xycheni/facebook-dinov3-vitl16-pretrain-lvd1689m

    134
    1
    transformers
    Text To Video

    wsbagnsv1/MoviiGen1.1-GGUF

    134
    28
    gguf
    Audio Classification

    Adbhut/wav2vec2-base-finetuned-gtzan

    134
    transformers
    Zero Shot Image Classification

    Leonardo6/clip-imagenet-finetuned

    134
    transformers
    315 / 400