NEWAgents can now see video via MCP.Try it now →

    AI Model Hub

    Browse AI models for multimodal decomposition and recomposition pipelines — plug any model into your extractors.

    9,002 models available

    Showing 29532976 of 9,002 models

    Image To Image

    lllyasviel/control_v11f1p_sd15_depth

    16K
    63
    diffusers
    Fill Mask

    facebook/xlm-roberta-xxl

    16K
    17
    transformers
    Text To Video

    hpcai-tech/Open-Sora-v2

    16K
    174
    open-sora
    Automatic Speech Recognition

    FluidInference/parakeet-realtime-eou-120m-coreml

    16K
    4
    nemo
    Zero Shot Image Classification

    laion/CLIP-ViT-B-16-DataComp.XL-s13B-b90K

    16K
    8
    open_clip
    Sentence Similarity

    StyleDistance/styledistance

    16K
    14
    sentence-transformers
    Image Classification

    timm/tf_efficientnet_lite0.in1k

    16K
    timm
    Automatic Speech Recognition

    Harveenchadha/vakyansh-wav2vec2-tamil-tam-250

    16K
    4
    transformers
    Image Classification

    timm/efficientnet_b1.ra4_e3600_r240_in1k

    16K
    1
    timm
    Translation

    tencent/HY-MT1.5-1.8B

    16K
    1,133
    transformers
    Image To Image

    prithivMLmods/Qwen-Image-Edit-2511-Hyper-Realistic-Portrait

    16K
    18
    diffusers
    Feature Extraction

    second-state/All-MiniLM-L6-v2-Embedding-GGUF

    16K
    22
    sentence-transformers
    Automatic Speech Recognition

    fatymatariq/speaker-diarization-3.1

    16K
    1
    pyannote-audio
    Image To Image

    microsoft/renderformer-v1.1-swin-large

    16K
    28
    renderformer
    Image To Image

    optimum-intel-internal-testing/tiny-random-stable-diffusion-xl-refiner

    16K
    diffusers
    Summarization

    cnicu/t5-small-booksum

    16K
    9
    transformers
    Translation

    Helsinki-NLP/opus-mt-en-id

    16K
    20
    transformers
    Any To Any

    deepseek-ai/Janus-Pro-1B

    16K
    474
    transformers
    Text Classification

    savasy/bert-base-turkish-sentiment-cased

    16K
    56
    transformers
    Image To Image

    vafipas663/Qwen-Edit-2509-Upscale-LoRA

    16K
    226
    diffusers
    Image Classification

    timm/swin_base_patch4_window12_384.ms_in22k_ft_in1k

    16K
    timm
    Voice Activity Detection

    fatymatariq/segmentation-3.0

    16K
    pyannote-audio
    Zero Shot Image Classification

    timm/ViT-B-16-SigLIP2

    16K
    open_clip
    Audio Classification

    Dpngtm/wav2vec2-emotion-recognition

    16K
    6
    transformers
    124 / 376