YOLO-World-L
by AILab-CVC
Real-time open-vocabulary object detection with text prompts
AILab-CVC/YOLO-World-Lmixpeek://image_extractor@v1/tencent_yoloworld_large_v1Overview
YOLO-World extends the YOLO detector family with open-vocabulary detection via vision-language modeling. Users specify objects to detect with text prompts; the model finds them zero-shot at real-time speeds (52 FPS on V100).
On Mixpeek, YOLO-World enables detecting arbitrary objects in video and images using natural language, without retraining for each new category.
Architecture
YOLO backbone with Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN). Uses region-text contrastive loss and a prompt-then-detect paradigm where vocabulary is embedded as model parameters for fast inference.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
namespace_id: "my-namespace",
collection_name: "my-collection",
source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
feature_extractor: {
feature_extractor_name: "object_detection",
version: "v1",
parameters: { model_id: "AILab-CVC/YOLO-World-L" },
},
});Capabilities
- Open-vocabulary detection with text prompts
- 52 FPS on V100 (real-time)
- 35.4 AP on LVIS zero-shot
- Supports image-prompted detection
- ONNX and TFLite INT8 export
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| LVIS (zero-shot) | AP | 35.4 | Cheng et al., 2024 — Table 1 |
| COCO val2017 | AP | 45.7 | Cheng et al., 2024 — Table 2 |
Performance
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Research Paper
YOLO-World: Real-Time Open-Vocabulary Object Detection
arxiv.orgBuild a pipeline with YOLO-World-L
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Run on your data, free