HFText Embeddings
sentence-transformers/all-MiniLM-L6-v2
1024-dim vector↓ 195.7M
HFVisual Embeddings
openai/clip-vit-large-patch14
768-dim vector↓ 28.6M
HFAudio Embeddings
laion/clap-htsat-fused
512-dim vector↓ 20.7M
HFSpeaker Diarization
pyannote/speaker-diarization-3.1
speaker segments↓ 10.9M
HFText Embeddings
BAAI/bge-m3
1024-dim vector↓ 8.2M
HFText Embeddings
BAAI/bge-large-en-v1.5
1024-dim vector↓ 7.1M
HFScene Captioning
google/gemma-4-E4B-it
text↓ 5.7M
HFTranscription
distil-whisper/distil-large-v3
text + timestamps↓ 4.8M
HFTranscription
openai/whisper-large-v3
text + timestamps↓ 4.7M
HFSegmentation
facebook/sam-vit-huge
mask + label↓ 3.2M
HFTable Extraction
microsoft/table-transformer-detection
table JSON↓ 3.0M
HFVisual Embeddings
facebook/dinov2-large
768-dim vector↓ 2.8M
HFScene Captioning
Qwen/Qwen3-VL-8B-Instruct
text↓ 2.8M
HFScene Captioning
Qwen/Qwen3.6-27B
text↓ 2.4M
HFText Embeddings
Qwen/Qwen3-VL-Embedding-2B
1024-dim vector↓ 2.4M
HFText Embeddings
Qwen/Qwen3-Embedding-0.6B
1024-dim vector↓ 2.1M
HFText Embeddings
Qwen/Qwen3-Embedding-4B
1024-dim vector↓ 2.0M
HFScene Captioning
google/gemma-4-4b-it
text↓ 1.9M
HFScene Captioning
google/paligemma2-3b-mix-448
text↓ 1.8M
HFText Embeddings
Qwen/Qwen3-Embedding-8B
1024-dim vector↓ 1.8M
HFSegmentation
facebook/sam2.1-hiera-large
mask + label↓ 1.8M
HFScene Captioning
OpenGVLab/InternVL3-8B
text↓ 1.6M
HFOCR
deepseek-ai/DeepSeek-OCR-2
text + bbox↓ 1.6M
HFText Embeddings
Qwen/Qwen3-VL-Embedding-8B
1024-dim vector↓ 1.6M
HFVisual Embeddings
jinaai/jina-embeddings-v4
768-dim vector↓ 1.5M
HFObject Detection
IDEA-Research/grounding-dino-base
bbox + label↓ 1.5M
HFText Embeddings
google/embeddinggemma-300m
1024-dim vector↓ 1.5M
HFText Embeddings
nomic-ai/nomic-embed-text-v2-moe
1024-dim vector↓ 1.4M
HFDepth Estimation
depth-anything/Depth-Anything-V2-Large
depth map↓ 1.4M
HFScene Captioning
microsoft/Florence-2-large
text↓ 1.3M
HFVisual Embeddings
google/siglip-base-patch16-224
768-dim vector↓ 1.2M
HFVisual Embeddings
google/siglip2-giant-opt-patch16-384
768-dim vector↓ 1.2M
HFVisual Embeddings
Marqo/marqo-fashionSigLIP
768-dim vector↓ 965K
HFVisual Embeddings
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
768-dim vector↓ 890K
HFOCR
lightonai/LightOnOCR-2-1B
text + bbox↓ 730K
NeMoTranscription
nvidia/parakeet-ctc-1.1b
text + timestamps↓ 689K
HFVisual Embeddings
BAAI/EVA02-CLIP-L-14-336
768-dim vector↓ 620K
HFObject Detection
google/owlvit-large-patch14
bbox + label↓ 580K
HFScene Captioning
Qwen/Qwen3-VL-4B-Instruct
text↓ 580K
HFDocument Structure
microsoft/layoutlmv3-base
structure tokens↓ 565K
HFOCR
microsoft/trocr-large-printed
text + bbox↓ 554K
HFOCR
zai-org/GLM-OCR
text + bbox↓ 520K
HFDepth Estimation
apple/DepthPro
depth map↓ 520K
HFScene Captioning
Salesforce/blip2-opt-2.7b
text↓ 516K
HFOCR
baidu/Qianfan-OCR
text + bbox↓ 482K
PyTorchVisual Embeddings
facebook/dinov3-large
768-dim vector↓ 450K
HFObject Detection
roboflow/rf-detr-base
bbox + label↓ 420K
NeMoTranscription
nvidia/parakeet-tdt-0.6b-v3
text + timestamps↓ 420K
HFText Embeddings
jinaai/jina-embeddings-v5-text-small
1024-dim vector↓ 420K
PyTorchSegmentation
facebook/sam3
mask + label↓ 420K
HFScene Captioning
microsoft/Phi-4-multimodal-instruct
text↓ 391K
HFVisual Embeddings
apple/AIMv2-large-patch14-native
768-dim vector↓ 380K
HFDocument Structure
fastino/gliner2-base-v1
structure tokens↓ 379K
PyTorchObject Detection
AILab-CVC/YOLO-World-L
bbox + label↓ 320K
HFTranscription
Qwen/Qwen3-ASR-1.7B
text + timestamps↓ 320K
HFScene Captioning
Qwen/Qwen3.6-35B-A3B
text↓ 310K
HFTranscription
microsoft/VibeVoice-ASR-HF
text + timestamps↓ 295K
HFOCR
opendatalab/MinerU2.5-Pro-2604-1.2B
text + bbox↓ 283K
HFSegmentation
facebook/sam3.1
mask + label↓ 270K
HFCode Extraction
microsoft/codebert-base
code + language↓ 261K
HFObject Detection
facebook/detr-resnet-50
bbox + label↓ 246K
HFScene Captioning
HuggingFaceTB/SmolVLM2-2.2B-Instruct
text↓ 238K
HFDocument Structure
naver-clova-ix/donut-base
structure tokens↓ 216K
HFOCR
tiiuae/Falcon-OCR
text + bbox↓ 195K
HFVisual Embeddings
nomic-ai/colnomic-embed-multimodal-7b
768-dim vector↓ 180K
HFFace Detection
isidentical/auraface-v1
face embedding↓ 180K
HFTranscription
usefulsensors/moonshine-streaming-medium
text + timestamps↓ 180K
PyTorchAnomaly Detection
amazon/patchcore-resnet50
anomaly score + map↓ 180K
HFDepth Estimation
depth-anything/DA3-SMALL
depth map↓ 161K
HFCode Extraction
Salesforce/codet5p-110m-embedding
code + language↓ 154K
HFDocument Structure
ibm-granite/granite-docling-258M
structure tokens↓ 150K
HFText Embeddings
perplexity-ai/pplx-embed-v1-0.6b
1024-dim vector↓ 120K
HFAudio Embeddings
facebook/encodec_24khz
512-dim vector↓ 112K
HFObject Detection
hustvl/yolos-tiny
bbox + label↓ 107K
HFText Embeddings
zeroentropy/zembed-1-embedding
1024-dim vector↓ 98K
HFTranscription
facebook/seamless-m4t-v2-large
text + timestamps↓ 85K
HFSegmentation
Roboflow/rf-detr-seg-large
mask + label↓ 85K
HFVisual Embeddings
vidore/colpali-v1.3
768-dim vector↓ 51K
HFVisual Embeddings
nvidia/llama-nemotron-embed-vl-1b-v2
768-dim vector↓ 42K
HFTranscription
facebook/wav2vec2-large-960h
text + timestamps↓ 37K
HFVisual Embeddings
vidore/colqwen2.5-v0.2
768-dim vector↓ 36K
HFDepth Estimation
depth-anything/DA3-LARGE-1.1
depth map↓ 26K
HFScene Captioning
deepseek-ai/deepseek-vl2-small
text↓ 15K
HFVisual Embeddings
nvidia/C-RADIOv4-H
768-dim vector↓ 7.7K
C++/PythonVector Indexing
facebook/faiss
index + results↓ 39.5K★
PyTorchObject Detection
ultralytics/yolov8n
bbox + label↓ —
PyTorchObject Detection
ultralytics/yolo11n
bbox + label↓ -
PyTorchObject Detection
ultralytics/yolo26n
bbox + label↓ -
HFFace Detection
deepinsight/retinaface-r50
face embedding↓ —
HFFace Detection
timesformer/facenet-pytorch
face embedding↓ —
PyTorchOCR
PaddlePaddle/paddleocr
text + bbox↓ —
PyTorchSegmentation
netflix/void-model
mask + label↓ —
C++/PythonVector Indexing
google/scann
index + results↓ -