NEWWhy single embeddings fail for video.Read the post →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,588 models available

    Showing 124 of 9,588 models

    Featured Models

    Benchmarked
    HFVisual Embeddings

    openai/clip-vit-large-patch14

    Contrastive Language-Image Pre-Training for zero-shot visual understanding

    28.6M
    3 benchmarks
    HFVisual Embeddings

    google/siglip-base-patch16-224

    Sigmoid Loss for Language Image Pre-Training, efficient contrastive learning

    1.2M
    3 benchmarks
    HFVisual Embeddings

    google/siglip2-giant-opt-patch16-384

    Multilingual vision-language encoder with dense features and localization

    1.2M
    2 benchmarks
    HFVisual Embeddings

    facebook/dinov2-large

    Self-supervised vision foundation model producing all-purpose visual features

    2.8M
    2 benchmarks
    PyTorchVisual Embeddings

    facebook/dinov3-large

    Next-generation self-supervised vision model with Gram anchoring and 6.7B scaling

    450K
    1 benchmarks
    HFVisual Embeddings

    laion/CLIP-ViT-bigG-14-laion2B-39B-b160k

    Open-source CLIP trained on 2B image-text pairs at giant scale

    890K
    2 benchmarks
    Sentence Similarity

    sentence-transformers/all-MiniLM-L6-v2

    233.5M
    4,749
    sentence-transformers
    Image Text To Text

    Qwen/Qwen3-VL-2B-Instruct

    186.8M
    385
    transformers
    Fill Mask

    google-bert/bert-base-uncased

    59.2M
    2,640
    transformers
    Sentence Similarity

    sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

    43.9M
    1,214
    sentence-transformers
    Sentence Similarity

    sentence-transformers/all-mpnet-base-v2

    36.2M
    1,286
    sentence-transformers
    Feature Extraction

    BAAI/bge-small-en-v1.5

    32.5M
    451
    sentence-transformers
    Zero Shot Image Classification

    openai/clip-vit-large-patch14

    24.6M
    2,000
    transformers
    Image Classification

    Falconsai/nsfw_image_detection

    23.2M
    1,064
    transformers
    Image Classification

    timm/mobilenetv3_small_100.lamb_in1k

    22.8M
    65
    timm
    Zero Shot Image Classification

    openai/clip-vit-base-patch32

    21.2M
    928
    transformers
    Sentence Similarity

    BAAI/bge-m3

    20.5M
    2,972
    sentence-transformers
    Text Generation

    Qwen/Qwen3-0.6B

    19.4M
    1,221
    transformers
    Fill Mask

    FacebookAI/roberta-base

    19.0M
    595
    transformers
    Fill Mask

    FacebookAI/roberta-large

    18.3M
    283
    transformers
    Fill Mask

    FacebookAI/xlm-roberta-base

    18.2M
    820
    transformers
    Audio Classification

    laion/clap-htsat-fused

    17.2M
    82
    transformers
    Text Generation

    openai-community/gpt2

    16.0M
    3,226
    transformers
    Zero Shot Image Classification

    openai/clip-vit-large-patch14-336

    15.7M
    304
    transformers
    Sentence Similarity

    nomic-ai/nomic-embed-text-v1.5

    15.0M
    811
    sentence-transformers
    Feature Extraction

    BAAI/bge-large-en-v1.5

    14.6M
    657
    sentence-transformers
    Text Generation

    Qwen/Qwen2.5-7B-Instruct

    13.8M
    1,256
    transformers
    Fill Mask

    distilbert/distilbert-base-uncased

    13.4M
    872
    transformers
    Text Generation

    deepseek-ai/DeepSeek-V3.2

    11.3M
    1,426
    transformers
    Text Generation

    Qwen/Qwen3-4B-Instruct-2507

    10.7M
    829
    transformers
    1 / 400