grounding-dino-base
by IDEA-Research
Open-set detection using natural language descriptions
IDEA-Research/grounding-dino-basemixpeek://image_extractor@v1/idea_grounding_dino_base_v1Overview
Grounding DINO combines a DINO-style detection transformer with grounded language understanding for open-set object detection. It achieves 52.5 AP on COCO with zero training data on COCO, and 56.7 AP when fine-tuned.
On Mixpeek, Grounding DINO enables detecting any object by describing it in text. Combined with segmentation models like SAM, it provides a powerful detect-then-segment pipeline.
Architecture
DINO-style detection transformer with Swin backbone, enhanced with text-grounding modules for open-vocabulary detection. Swin-B variant achieves 56.7 AP on COCO.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
namespace_id: "my-namespace",
collection_name: "my-collection",
source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
feature_extractor: {
feature_extractor_name: "object_detection",
version: "v1",
parameters: { model_id: "IDEA-Research/grounding-dino-base" },
},
});Capabilities
- Zero-shot detection: 52.5 AP on COCO without COCO training data
- Natural language object descriptions as prompts
- Fine-tuned detection: 56.7 AP (Swin-B)
- Pairs with SAM for detect-then-segment pipelines
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| COCO val2017 (zero-shot) | AP | 48.4 | Liu et al., 2024 — Table 1 |
| RefCOCO (val) | Accuracy | 89.2% | Liu et al., 2024 — Table 3 |
Performance
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Research Paper
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
arxiv.orgBuild a pipeline with grounding-dino-base
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Run on your data, free