owlv2-large-patch14-ensemble
by google
Open-vocabulary OWLv2 detector for text-conditioned object search
google/owlv2-large-patch14-ensemblemixpeek://image_extractor@v1/google_owlv2_large_ensemble_v1Overview
OWLv2 Large Patch14 Ensemble is Google's open-vocabulary detector for zero-shot object localization. It lets a pipeline search for objects described in text instead of relying only on a fixed supervised label set.
On Mixpeek, OWLv2 is useful when an agent needs to find visual categories that change by task: a specific product shape, a UI control, damaged equipment, or a visual policy violation. The detector outputs boxes and labels that can be stored, filtered, and joined with embeddings or captions.
Architecture
Vision Transformer based open-vocabulary object detector. It aligns text queries and image regions so arbitrary text labels can guide detection at inference time.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "product-images",source: { url: "s3://catalog/images/" },feature_extractors: [{feature: "object_detection",model: "google/owlv2-large-patch14-ensemble"}]});
Capabilities
- Zero-shot object detection
- Text-conditioned visual localization
- Strong fit for dynamic agent queries
- Apache 2.0 license
Use Cases on Mixpeek
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Research Paper
OWLv2 Large Patch14 Ensemble
arxiv.orgBuild a pipeline with owlv2-large-patch14-ensemble
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio