NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 77057728 of 9,588 models

    Depth Estimation

    DarthReca/depth-any-canopy-small

    118
    1
    transformers
    Text To Video

    Efficient-Large-Model/SANA-Video_2B_480p

    118
    13
    sana, sana-video
    Audio Classification

    tiantiaf/voxlect-indic-lid-mms-lid-256

    118
    1
    transformers
    Audio Classification

    helenai/MIT-ast-finetuned-speech-commands-v2-ov

    118
    transformers
    Audio Classification

    Adipiz99/LAVA-Framework

    118
    lava-framework
    Audio Classification

    AescF/distilhubert-finetuned-gtzan

    118
    transformers
    Audio Classification

    onnx-community/ast-finetuned-audioset-10-10-0.4593-ONNX

    118
    2
    transformers.js
    Unconditional Image Generation

    achsaf/ddpm-pixelart-16x16-v3

    118
    diffusers
    Video Classification

    Blessing988/videomae-base-finetuned-ucf101-subset_500_epochs

    117
    transformers
    Video Classification

    Joy28/videomae-base-finetuned-subset-100epochs

    117
    transformers
    Video Classification

    Afaan97/videomae-base-finetuned-myvideos-subset

    117
    transformers
    Video Classification

    Joy28/videomae-base-finetuned-ucf101-subset-finetuned-subset

    117
    transformers
    Video Classification

    Joy28/videomae-base-finetuned-subset-check100

    117
    transformers
    Audio To Audio

    iky1e/DeepFilterNet3-MLX

    117
    2
    mlx
    Object Detection

    Xenova/gelan-c

    117
    transformers.js
    Image Segmentation

    smp-hub/upernet-swin-tiny

    117
    segmentation-models-pytorch
    Image Segmentation

    smp-test-models/deeplabv3plus-tu-resnet18

    117
    segmentation-models-pytorch
    Image Feature Extraction

    timm/vit_base_patch16_siglip_gap_512.v2_webli

    117
    1
    timm
    Image Feature Extraction

    r3gm/controlnet-noobai-openpose-sdxl-fp16

    117
    1
    diffusers
    Text To Video

    ayoub1222/Wan2.1-T2V-14B

    117
    diffusers
    Audio Classification

    AescF/hubert-base-ls960-finetuned-common_language

    117
    1
    transformers
    Visual Question Answering

    DAMO-NLP-SG/VideoLLaMA2-7B-16F

    116
    14
    transformers
    Object Detection

    keremberke/yolov5m-construction-safety

    116
    5
    yolov5
    Object Detection

    ianpan/mammo-crop

    116
    transformers
    322 / 400