HFText Embeddings
sentence-transformers/all-MiniLM-L6-v2
1024-dim vector↓ 224.8M
HFText Embeddings
BAAI/bge-m3
1024-dim vector↓ 28.9M
HFAudio Embeddings
laion/clap-htsat-fused
512-dim vector↓ 18.2M
HFAnomaly Detection
amazon/chronos-2
anomaly score + map↓ 14.8M
HFVisual Embeddings
openai/clip-vit-large-patch14
768-dim vector↓ 13.9M
HFText Embeddings
BAAI/bge-large-en-v1.5
1024-dim vector↓ 13.9M
HFDepth Estimation
lpiccinelli/unidepth-v2-vitl14
depth map↓ 10.5M
HFText Embeddings
Qwen/Qwen3-Embedding-0.6B
1024-dim vector↓ 8.6M
HFSpeaker Diarization
pyannote/speaker-diarization-3.1
speaker segments↓ 8.2M
HFTranscription
openai/whisper-large-v3-turbo
text + timestamps↓ 7.8M
HFScene Captioning
google/gemma-4-E4B-it
text↓ 5.7M
HFSpeaker Diarization
pyannote/wespeaker-voxceleb-resnet34-LM
speaker segments↓ 5.7M
HFTranscription
openai/whisper-large-v3
text + timestamps↓ 5.1M
HFTranscription
distil-whisper/distil-large-v3
text + timestamps↓ 4.8M
HFTable Extraction
docling-project/docling-models
table JSON↓ 3.8M
HFSpeaker Diarization
speechbrain/spkrec-ecapa-voxceleb
speaker segments↓ 3.4M
HFSegmentation
facebook/sam-vit-huge
mask + label↓ 3.2M
HFOCR
stepfun-ai/GOT-OCR-2.0-hf
text + bbox↓ 3.1M
HFVisual Embeddings
facebook/dinov2-large
768-dim vector↓ 2.8M
HFScene Captioning
Qwen/Qwen3-VL-8B-Instruct
text↓ 2.8M
HFText Embeddings
nomic-ai/modernbert-embed-base
1024-dim vector↓ 2.8M
HFScene Captioning
moonshotai/Kimi-K2.6
text↓ 2.7M
HFSpeaker Diarization
pyannote/speaker-diarization-community-1
speaker segments↓ 2.7M
HFScene Captioning
google/gemma-4-E2B-it
text↓ 2.4M
HFScene Captioning
Qwen/Qwen3.6-27B
text↓ 2.4M
HFText Embeddings
Qwen/Qwen3-Embedding-4B
1024-dim vector↓ 2.3M
HFObject Detection
IDEA-Research/grounding-dino-base
bbox + label↓ 2.2M
PyTorchFace Detection
deepinsight/insightface-retinaface-r50
face embedding↓ 2.2M
HFDocument Structure
microsoft/layoutlmv3-base
structure tokens↓ 2.0M
HFScene Captioning
google/paligemma2-3b-mix-448
text↓ 1.8M
HFText Embeddings
Qwen/Qwen3-Embedding-8B
1024-dim vector↓ 1.8M
PyTorchSegmentation
facebook/sam3
mask + label↓ 1.8M
HFAudio Embeddings
MIT/ast-finetuned-audioset-10-10-0.4593
512-dim vector↓ 1.7M
HFTable Extraction
microsoft/table-transformer-detection
table JSON↓ 1.7M
HFSegmentation
CIDAS/clipseg-rd64-refined
mask + label↓ 1.7M
HFScene Captioning
OpenGVLab/InternVL3-8B
text↓ 1.6M
HFOCR
deepseek-ai/DeepSeek-OCR-2
text + bbox↓ 1.6M
HFText Embeddings
Qwen/Qwen3-VL-Embedding-8B
1024-dim vector↓ 1.6M
NeMoSpeaker Diarization
nvidia/speakerverification_en_titanet_large
speaker segments↓ 1.5M
HFText Embeddings
google/embeddinggemma-300m
1024-dim vector↓ 1.5M
HFText Embeddings
nvidia/NV-Embed-v2
1024-dim vector↓ 1.5M
HFOCR
datalab-to/chandra-ocr-2
text + bbox↓ 1.45M
HFVisual Embeddings
Marqo/marqo-fashionSigLIP
768-dim vector↓ 1.4M
HFScene Captioning
Qwen/Qwen3-VL-30B-A3B-Instruct
text↓ 1.4M
HFAudio Embeddings
laion/larger_clap_general
512-dim vector↓ 1.4M
HFText Embeddings
nomic-ai/nomic-embed-text-v2-moe
1024-dim vector↓ 1.4M
HFVisual Embeddings
nomic-ai/nomic-embed-vision-v1.5
768-dim vector↓ 1.3M
HFAudio Embeddings
microsoft/wavlm-large
512-dim vector↓ 1.3M
HFAudio Embeddings
kyutai/mimi
512-dim vector↓ 1.3M
HFVisual Embeddings
google/siglip-base-patch16-224
768-dim vector↓ 1.2M
HFText Embeddings
Snowflake/snowflake-arctic-embed-m-v2.0
1024-dim vector↓ 1.2M
HFVisual Embeddings
google/siglip2-so400m-patch16-384
768-dim vector↓ 1.1M
PyTorchFace Detection
timesler/facenet-pytorch
face embedding↓ 1.1M
HFScene Captioning
google/gemma-4-26B-A4B-it
text↓ 1.1M
HFTranscription
mistralai/Voxtral-Mini-4B-Realtime-2602
text + timestamps↓ 1.1M
HFDocument Structure
docling-project/docling-layout-heron
structure tokens↓ 1.1M
HFText Embeddings
Snowflake/snowflake-arctic-embed-l-v2.0
1024-dim vector↓ 1.03M
HFText Embeddings
Qwen/Qwen3-VL-Embedding-2B
1024-dim vector↓ 983K
HFAnomaly Detection
amazon/chronos-bolt-base
anomaly score + map↓ 980K
HFScene Captioning
LGAI-EXAONE/EXAONE-4.5-33B
text↓ 976K
HFText Embeddings
nomic-ai/modernbert-embed-large
1024-dim vector↓ 890K
HFSegmentation
ZhengPeng7/BiRefNet
mask + label↓ 824K
HFScene Captioning
google/gemma-4-31B-it
text↓ 820K
HFTranscription
Qwen/Qwen3-ASR-0.6B
text + timestamps↓ 808K
NeMoTranscription
nvidia/parakeet-ctc-1.1b
text + timestamps↓ 807K
HFOCR
lightonai/LightOnOCR-2-1B
text + bbox↓ 730K
HFSegmentation
briaai/RMBG-2.0
mask + label↓ 717K
HFScene Captioning
Salesforce/blip2-opt-2.7b
text↓ 588K
HFScene Captioning
google/gemma-4-12B-it
text↓ 581K
HFScene Captioning
Qwen/Qwen3-VL-4B-Instruct
text↓ 580K
HFTranscription
facebook/hubert-large-ls960-ft
text + timestamps↓ 579K
HFTranscription
CohereLabs/cohere-transcribe-03-2026
text + timestamps↓ 552K
HFVisual Embeddings
facebook/dinov3-vitl16-pretrain-lvd1689m
768-dim vector↓ 546K
HFVisual Embeddings
jinaai/jina-embeddings-v4
768-dim vector↓ 541K
HFTranscription
mistralai/Voxtral-Mini-3B-2507
text + timestamps↓ 532K
HFScene Captioning
microsoft/Florence-2-large
text↓ 525K
HFText Embeddings
jinaai/jina-embeddings-v5-text-nano
1024-dim vector↓ 523K
HFOCR
zai-org/GLM-OCR
text + bbox↓ 520K
HFDocument Structure
docling-project/SmolDocling-256M-preview
structure tokens↓ 520K
HFDepth Estimation
apple/DepthPro
depth map↓ 520K
HFTranscription
ibm-granite/granite-speech-4.1-2b
text + timestamps↓ 518K
HFOCR
baidu/Qianfan-OCR
text + bbox↓ 482K
PyTorchVisual Embeddings
facebook/dinov3-large
768-dim vector↓ 450K
HFScene Captioning
OpenGVLab/InternVL3-78B
text↓ 450K
HFText Embeddings
ibm-granite/granite-embedding-english-r2
1024-dim vector↓ 420K
HFTranscription
Qwen/Qwen3-ForcedAligner-0.6B
text + timestamps↓ 404K
HFDocument Structure
PaddlePaddle/PP-DocLayoutV3
structure tokens↓ 400K
HFScene Captioning
microsoft/Phi-4-multimodal-instruct
text↓ 391K
HFAudio Embeddings
laion/clap-htsat-unfused
512-dim vector↓ 389K
HFDocument Structure
fastino/gliner2-base-v1
structure tokens↓ 379K
HFVisual Embeddings
facebook/vjepa2-vitg-fpc64-256
768-dim vector↓ 372K
HFCode Extraction
nomic-ai/nomic-embed-code
code + language↓ 361K
NeMoTranscription
nvidia/parakeet-tdt-1.1b
text + timestamps↓ 350K
HFOCR
nvidia/NVIDIA-Nemotron-Parse-v1.1
text + bbox↓ 343K
HFOCR
tencent/HunyuanOCR
text + bbox↓ 337K
HFSegmentation
briaai/RMBG-1.4
mask + label↓ 333K
HFAudio Embeddings
laion/larger_clap_music_and_speech
512-dim vector↓ 330K
HFAudio Embeddings
m-a-p/MERT-v1-330M
512-dim vector↓ 329K
HFText Embeddings
ibm-granite/granite-embedding-311m-multilingual-r2
1024-dim vector↓ 324K
HFText Embeddings
ibm-granite/granite-embedding-311m-multilingual-r2
1024-dim vector↓ 324K
PyTorchObject Detection
AILab-CVC/YOLO-World-L
bbox + label↓ 320K
HFScene Captioning
microsoft/Phi-4-reasoning-vision-15B
text↓ 320K
HFTranscription
Qwen/Qwen3-ASR-1.7B
text + timestamps↓ 320K
HFScene Captioning
Qwen/Qwen3.6-35B-A3B
text↓ 310K
HFVisual Embeddings
google/siglip2-giant-opt-patch16-384
768-dim vector↓ 309K
HFTranscription
facebook/mms-1b-all
text + timestamps↓ 298K
HFTranscription
microsoft/VibeVoice-ASR-HF
text + timestamps↓ 295K
HFOCR
opendatalab/MinerU2.5-Pro-2604-1.2B
text + bbox↓ 283K
HFOCR
rednote-hilab/dots.ocr
text + bbox↓ 281K
HFOCR
microsoft/trocr-large-handwritten
text + bbox↓ 280K
HFScene Captioning
moondream/moondream3-preview
text↓ 276K
HFObject Detection
facebook/detr-resnet-50
bbox + label↓ 273K
HFAudio Embeddings
laion/larger_clap_music
512-dim vector↓ 270K
HFText Embeddings
jinaai/jina-embeddings-v5-text-small
1024-dim vector↓ 269K
HFScene Captioning
nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16
text↓ 263K
HFText Embeddings
ibm-granite/granite-embedding-small-english-r2
1024-dim vector↓ 260K
HFCode Extraction
microsoft/codebert-base
code + language↓ 254K
HFScene Captioning
apple/FastVLM-0.5B
text↓ 240K
HFScene Captioning
HuggingFaceTB/SmolVLM2-2.2B-Instruct
text↓ 238K
HFScene Captioning
openbmb/MiniCPM-V-4.6
text↓ 222K
HFOCR
reducto/RolmOCR
text + bbox↓ 211K
HFObject Detection
ustc-community/dfine-xlarge-coco
bbox + label↓ 210K
HFOCR
ByteDance/Dolphin-v2
text + bbox↓ 210K
HFTranscription
kyutai/stt-2.6b-en
text + timestamps↓ 210K
HFAudio Embeddings
m-a-p/MERT-v1-95M
512-dim vector↓ 210K
HFText Embeddings
microsoft/harrier-oss-v1-0.6b
1024-dim vector↓ 203K
HFOCR
tiiuae/Falcon-OCR
text + bbox↓ 195K
HFSegmentation
ZhengPeng7/BiRefNet_HR
mask + label↓ 190K
HFVisual Embeddings
jinaai/jina-embeddings-v5-omni-small
768-dim vector↓ 182K
HFObject Detection
PekingU/rtdetr_v2_r101vd
bbox + label↓ 180K
HFFace Detection
fal/AuraFace-v1
face embedding↓ 180K
HFTranscription
usefulsensors/moonshine-streaming-medium
text + timestamps↓ 180K
HFCode Extraction
nomic-ai/CodeRankEmbed
code + language↓ 180K
PyTorchAnomaly Detection
amazon/patchcore-resnet50
anomaly score + map↓ 180K
HFDepth Estimation
facebook/VGGT-1B
depth map↓ 180K
HFScene Captioning
zai-org/GLM-4.5V
text↓ 177K
HFText Embeddings
Alibaba-NLP/gte-modernbert-base
1024-dim vector↓ 175K
HFScene Captioning
nvidia/Cosmos-Reason2-2B
text↓ 169K
HFText Embeddings
voyageai/voyage-4-nano
1024-dim vector↓ 168K
HFVisual Embeddings
facebook/vjepa2-vitl-fpc64-256
768-dim vector↓ 154K
HFDocument Structure
ibm-granite/granite-docling-258M
structure tokens↓ 150K
HFSegmentation
facebook/sam3.1
mask + label↓ 147K
HFScene Captioning
CohereLabs/command-a-plus-05-2026-bf16
text↓ 145K
HFTable Extraction
microsoft/table-transformer-structure-recognition-v1.1-all
table JSON↓ 138K
HFVisual Embeddings
nvidia/llama-nemotron-embed-vl-1b-v2
768-dim vector↓ 137K
HFDocument Structure
naver-clova-ix/donut-base
structure tokens↓ 137K
HFObject Detection
nvidia/LocateAnything-3B
bbox + label↓ 132K
HFVisual Embeddings
TomoroAI/tomoro-colqwen3-embed-4b
768-dim vector↓ 130K
HFText Embeddings
codefuse-ai/F2LLM-v2-14B
1024-dim vector↓ 126K
HFVisual Embeddings
facebook/dinov3-convnext-large-pretrain-lvd1689m
768-dim vector↓ 120K
HFTranscription
ibm-granite/granite-4.0-1b-speech
text + timestamps↓ 120K
HFText Embeddings
perplexity-ai/pplx-embed-v1-0.6b
1024-dim vector↓ 120K
HFAnomaly Detection
Salesforce/moirai-2.0-R-small
anomaly score + map↓ 120K
HFText Embeddings
lightonai/GTE-ModernColBERT-v1
1024-dim vector↓ 119K
HFOCR
microsoft/trocr-large-printed
text + bbox↓ 118K
HFScene Captioning
openbmb/MiniCPM-V-4_5
text↓ 116K
HFScene Captioning
zai-org/GLM-4.6V
text↓ 112K
HFScene Captioning
nvidia/Eagle2.5-8B
text↓ 110K
NeMoTranscription
nvidia/parakeet-tdt-0.6b-v3
text + timestamps↓ 106K
HFScene Captioning
openbmb/MiniCPM-o-4_5
text↓ 100K
HFOCR
nanonets/Nanonets-OCR2-3B
text + bbox↓ 100K
HFScene Captioning
apple/FastVLM-7B
text↓ 98K
HFText Embeddings
zeroentropy/zembed-1-embedding
1024-dim vector↓ 98K
HFVisual Embeddings
vidore/colqwen2.5-v0.2
768-dim vector↓ 96K
NeMoTranscription
nvidia/canary-1b-v2
text + timestamps↓ 96K
HFTranscription
ibm-granite/granite-speech-4.1-2b-plus
text + timestamps↓ 95K
HFObject Detection
hustvl/yolos-tiny
bbox + label↓ 93K
HFAnomaly Detection
google/timesfm-2.5-200m-transformers
anomaly score + map↓ 93K
HFOCR
allenai/olmOCR-2-7B-1025
text + bbox↓ 88K
HFVisual Embeddings
jinaai/jina-embeddings-v5-omni-nano
768-dim vector↓ 85K
HFScene Captioning
microsoft/OmniParser-v2.0
text↓ 85K
HFScene Captioning
allenai/Molmo2-8B
text↓ 85K
HFTranscription
facebook/seamless-m4t-v2-large
text + timestamps↓ 85K
HFAnomaly Detection
Maple728/TimeMoE-50M
anomaly score + map↓ 85K
HFVisual Embeddings
jinaai/jina-clip-v2
768-dim vector↓ 73K
HFObject Detection
google/owlv2-large-patch14-ensemble
bbox + label↓ 73K
HFDepth Estimation
depth-anything/Depth-Anything-V2-Large
depth map↓ 72K
HFSegmentation
facebook/sam2.1-hiera-large
mask + label↓ 70K
NeMoTranscription
nvidia/nemotron-speech-streaming-en-0.6b
text + timestamps↓ 65K
HFCode Extraction
Salesforce/SFR-Embedding-Code-400M_R
code + language↓ 65K
HFVisual Embeddings
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
768-dim vector↓ 62K
HFTable Extraction
microsoft/table-transformer-structure-recognition-v1.1-pub
table JSON↓ 62K
HFVisual Embeddings
facebook/PE-Core-G14-448
768-dim vector↓ 58K
HFDepth Estimation
depth-anything/DA3NESTED-GIANT-LARGE-1.1
depth map↓ 58K
HFDepth Estimation
depth-anything/DA3-LARGE-1.1
depth map↓ 50K
HFScene Captioning
bytedance-research/Vidi-7B
text↓ 48K
HFSpeaker Diarization
Wespeaker/wespeaker-voxceleb-resnet293-LM
speaker segments↓ 48K
HFDepth Estimation
Intel/dpt-large
depth map↓ 46K
HFScene Captioning
ibm-granite/granite-4.0-3b-vision
text↓ 45K
HFScene Captioning
omni-research/Tarsier2-7b-0115
text↓ 45K
HFSegmentation
facebook/EdgeTAM
mask + label↓ 45K
HFVisual Embeddings
facebook/metaclip-2-worldwide-huge-quickgelu
768-dim vector↓ 44K
HFAudio Embeddings
facebook/encodec_24khz
512-dim vector↓ 43K
HFText Embeddings
nvidia/llama-embed-nemotron-8b
1024-dim vector↓ 40K
HFOCR
PaddlePaddle/PaddleOCR-VL-1.5
text + bbox↓ 39K
HFOCR
ibm-granite/granite-vision-4.1-4b
text + bbox↓ 39K
HFScene Captioning
Kwai-Keye/Keye-VL-8B-Preview
text↓ 38K
HFCode Extraction
Salesforce/codet5p-110m-embedding
code + language↓ 38K
CosmosScene Captioning
nvidia/Cosmos3-Nano
text↓ 36.7K
HFVisual Embeddings
vidore/colpali-v1.3
768-dim vector↓ 33K
HFScene Captioning
sensenova/SenseNova-U1-8B-MoT
text↓ 32.7K
HFScene Captioning
bytedance-research/Lance
text↓ 32K
HFVisual Embeddings
OpenGVLab/InternVideo2-Stage1-1B-224p-K700
768-dim vector↓ 31K
HFVisual Embeddings
facebook/PE-Spatial-G14-448
768-dim vector↓ 31K
HFVisual Embeddings
nomic-ai/colnomic-embed-multimodal-7b
768-dim vector↓ 30K
HFVisual Embeddings
nvidia/C-RADIOv4-H
768-dim vector↓ 30K
NeMoSpeaker Diarization
nvidia/diar_streaming_sortformer_4spk-v2
speaker segments↓ 30K
HFObject Detection
openmmlab-community/mm_grounding_dino_large_all
bbox + label↓ 27.7K
HFFace Detection
Idiap/EdgeFace-S-GAMMA
face embedding↓ 24K
HFTranscription
facebook/wav2vec2-large-960h
text + timestamps↓ 24K
HFText Embeddings
mixedbread-ai/mxbai-colbert-large-v1
1024-dim vector↓ 22K
HFDepth Estimation
depth-anything/DA3-SMALL
depth map↓ 22K
HFDepth Estimation
yyfz233/Pi3
depth map↓ 22K
HFAnomaly Detection
google/timesfm-2.0-500m-pytorch
anomaly score + map↓ 20.5K
HFScene Captioning
Vision-CAIR/Tempo-6B
text↓ 18K
HFVisual Embeddings
nvidia/nemotron-colembed-vl-8b-v2
768-dim vector↓ 16K
HFObject Detection
iSEE-Laboratory/llmdet_large
bbox + label↓ 16K
HFText Embeddings
ibm-granite/granite-embedding-97m-multilingual-r2
1024-dim vector↓ 15.6K
HFScene Captioning
deepseek-ai/deepseek-vl2-small
text↓ 15K
HFScene Captioning
microsoft/Fara-7B
text↓ 13.6K
HFVisual Embeddings
jinaai/jina-embeddings-v5-omni-small-retrieval
768-dim vector↓ 12K
HFVisual Embeddings
nvidia/omni-embed-nemotron-3b
768-dim vector↓ 12K
HFVisual Embeddings
google/videoprism-base-f16r288
768-dim vector↓ 12K
HFAudio Embeddings
mispeech/dasheng-1.2B
512-dim vector↓ 12K
HFObject Detection
omlab/omdet-turbo-swin-tiny-hf
bbox + label↓ 11.5K
HFVisual Embeddings
BAAI/EVA02-CLIP-L-14-336
768-dim vector↓ 11K
HFVisual Embeddings
facebook/dinov3-vit7b16-pretrain-lvd1689m
768-dim vector↓ 11K
HFOCR
PaddlePaddle/PaddleOCR-VL-1.6
text + bbox↓ 11K
HFScene Captioning
moonshotai/Kimi-VL-A3B-Thinking-2506
text↓ 10.3K
HFDocument Structure
numind/NuExtract3
structure tokens↓ 10K
HFVisual Embeddings
nvidia/C-RADIOv4-SO400M
768-dim vector↓ 9.4K
HFScene Captioning
stepfun-ai/Step-3.7-Flash
text↓ 9.3K
HFText Embeddings
lightonai/Reason-ModernColBERT
1024-dim vector↓ 9.1K
HFAnomaly Detection
nvidia/Cosmos-Embed1-448p-anomaly-detection
anomaly score + map↓ 9K
HFCode Extraction
jinaai/jina-code-embeddings-1.5b
code + language↓ 8.9K
HFVisual Embeddings
nomic-ai/nomic-embed-multimodal-3b
768-dim vector↓ 8K
HFVisual Embeddings
BidirLM/BidirLM-Omni-2.5B-Embedding
768-dim vector↓ 8K
HFObject Detection
google/owlvit-large-patch14
bbox + label↓ 8K
HFAudio Embeddings
facebook/encodec_48khz
512-dim vector↓ 7.8K
HFScene Captioning
AIDC-AI/Ovis2.6-30B-A3B
text↓ 7.4K
HFSegmentation
facebook/sam2.1-hiera-base-plus
mask + label↓ 7.3K
HFVisual Embeddings
vidore/colqwen-omni-v0.1
768-dim vector↓ 7K
PyTorchObject Detection
ultralytics/yolo26n
bbox + label↓ 7K
NeMoSpeaker Diarization
nvidia/diar_sortformer_4spk-v1
speaker segments↓ 6.1K
HFText Embeddings
perplexity-ai/pplx-embed-context-v1-4b
1024-dim vector↓ 5.2K
HFVisual Embeddings
VLM2Vec/VLM2Vec-V2.0
768-dim vector↓ 5K
HFText Embeddings
perplexity-ai/pplx-embed-v1-late-0.6b
1024-dim vector↓ 4.9K
HFAudio Embeddings
mtg-upf/discogs-maest-30s-pw-129e
512-dim vector↓ 4.5K
NeMoTranscription
nvidia/nemotron-3.5-asr-streaming-0.6b
text + timestamps↓ 4.2K
HFOCR
FireRedTeam/FireRed-OCR
text + bbox↓ 4.1K
HFVisual Embeddings
jinaai/jina-embeddings-v5-omni-nano-retrieval
768-dim vector↓ 4K
HFScene Captioning
NemoStation/Marlin-2B
text↓ 4K
HFScene Captioning
DAMO-NLP-SG/VideoLLaMA3-7B
text↓ 4K
HFTranscription
FunAudioLLM/SenseVoiceSmall
text + timestamps↓ 4.0K
HFAudio Embeddings
facebook/pe-av-large
512-dim vector↓ 4K
HFTable Extraction
foduucom/table-detection-and-extraction
table JSON↓ 3.4K
HFDocument Structure
google/pix2struct-base
structure tokens↓ 3.3K
HFVisual Embeddings
google/videoprism-large-f8r288
768-dim vector↓ 3K
PyTorchOCR
nvidia/nemotron-ocr-v2
text + bbox↓ 2.9K
HFFace Detection
minchul/cvlface_adaface_ir101_webface12m
face embedding↓ 2.7K
HFVisual Embeddings
LCO-Embedding/LCO-Embedding-Omni-7B
768-dim vector↓ 2.1K
HFVisual Embeddings
nomic-ai/nomic-embed-multimodal-7b
768-dim vector↓ 2K
HFVisual Embeddings
moonshotai/MoonViT-SO-400M
768-dim vector↓ 2K
HFTranscription
XiaomiMiMo/MiMo-V2.5-ASR
text + timestamps↓ 2K
HFText Embeddings
lightonai/Agent-ModernColBERT
1024-dim vector↓ 1.9K
HFCode Extraction
Salesforce/SFR-Embedding-Code-2B_R
code + language↓ 1.5K
NeMoTranscription
nvidia/parakeet-rnnt-1.1b
text + timestamps↓ 1.4K
HFScene Captioning
facebook/Perception-LM-3B
text↓ 1.3K
HFScene Captioning
Hcompany/Holo-3.1-4B
text↓ 1.3K
HFAnomaly Detection
Datadog/Toto-2.0-2.5B
anomaly score + map↓ 1.3K
HFFace Detection
minchul/cvlface_adaface_ir50_ms1mv2
face embedding↓ 1.2K
HFVisual Embeddings
BAAI/BGE-VL-base
768-dim vector↓ 1K
HFVisual Embeddings
webAI-Official/webAI-ColVec1-4b
768-dim vector↓ 1K
HFObject Detection
Roboflow/rf-detr-base
bbox + label↓ 994
HFScene Captioning
TencentARC/TimeLens-8B
text↓ 968
HFVisual Embeddings
LCO-Embedding/LCO-Embedding-Omni-3B
768-dim vector↓ 728
HFVisual Embeddings
Cognitive-Lab/ColNetraEmbed
768-dim vector↓ 685
HFObject Detection
Roboflow/rf-detr-medium
bbox + label↓ 627
HFScene Captioning
ParaVT/ParaVT-8B
text↓ 544
HFFace Detection
minchul/cvlface_adaface_vit_base_kprpe_webface12m
face embedding↓ 409
HFObject Detection
Roboflow/rf-detr-large
bbox + label↓ 376
HFVisual Embeddings
apple/aimv2-large-patch14-native
768-dim vector↓ 343
HFVisual Embeddings
NCSOFT/GME-VARCO-VISION-Embedding
768-dim vector↓ 312
HFScene Captioning
Kwai-Keye/Keye-VL-2.0-30B-A3B
text↓ 290
HFText Embeddings
Kingsoft-LLM/QZhou-Embedding
1024-dim vector↓ 273
HFVisual Embeddings
Haon-Chen/e5-omni-7B
768-dim vector↓ 261
HFAudio Embeddings
tsinghua-ee/WAVE-7B
512-dim vector↓ 230
HFDocument Structure
microsoft/dit-large
structure tokens↓ 217
HFObject Detection
fushh7/ObjEmbed-2B
bbox + label↓ 123
HFScene Captioning
nvidia/4D-RGPT-8B
text↓ 108
HFVisual Embeddings
apple/aimv2-3B-patch14-448
768-dim vector↓ 80
HFSegmentation
Roboflow/rf-detr-seg-large
mask + label↓ 78
HFFace Detection
minchul/cvlface_adaface_vit_base_webface4m
face embedding↓ 65
HFVisual Embeddings
Alibaba-NLP/GVE-3B
768-dim vector↓ 62
HFVisual Embeddings
BAAI/BGE-VL-v1.5-zs
768-dim vector↓ 41
C++/PythonVector Indexing
facebook/faiss
index + results↓ 39.5K★
HFVisual Embeddings
FireRedTeam/ReMatch-3B
768-dim vector↓ 30
HFScene Captioning
WorldSeek-AI/WorldSeek-Omni-2B-Preview
text↓ 21
HFVisual Embeddings
ahmed-masry/ColMate-3B
768-dim vector↓ N/A
HFVisual Embeddings
ModernVBERT/ColModernVBERT
768-dim vector↓ N/A
PyTorchObject Detection
ultralytics/yolov8n
bbox + label↓ N/A
PyTorchObject Detection
ultralytics/yolo11n
bbox + label↓ -
ONNXFace Detection
immich-app/buffalo_l
face embedding↓ N/A
HFScene Captioning
OpenGVLab/InternVL3_5-8B
text↓ N/A
PyTorchOCR
PaddlePaddle/paddleocr
text + bbox↓ N/A
NeMoTranscription
nvidia/canary-qwen-2.5b
text + timestamps↓ N/A
PyTorchAudio Embeddings
microsoft/BEATs
512-dim vector↓ NEW
HFText Embeddings
lightonai/LateOn-Code
1024-dim vector↓ N/A
PyTorchDocument Structure
nvidia/nemotron-page-elements-v3
structure tokens↓ N/A
PyTorchTable Extraction
poloclub/UniTable
table JSON↓ NEW
PyTorchSegmentation
netflix/void-model
mask + label↓ N/A
C++/PythonVector Indexing
google/scann
index + results↓ -