clipseg-rd64-refined
by CIDAS
Text-prompted image segmentation for queryable masks and region crops
CIDAS/clipseg-rd64-refinedmixpeek://image_extractor@v1/cidas_clipseg_rd64_refined_v1Overview
CLIPSeg RD64 Refined is a CLIP-conditioned segmentation model that produces a mask from an image plus a natural language prompt. Instead of requiring a fixed class label set, it lets a pipeline ask for regions like "red logo," "person holding a box," or "damaged corner" and turn those regions into indexed evidence.
On Mixpeek, CLIPSeg is useful before region embedding or visual QA. The segmenter isolates the queried foreground, Mixpeek stores the mask geometry and crop lineage, and an agent can search or inspect the precise region instead of the whole frame.
Architecture
CLIPSeg combines a CLIP visual-text backbone with a lightweight decoder for dense prediction. The RD64 refined checkpoint is optimized for image segmentation with natural language prompts and outputs pixel masks aligned to the input image.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "visual-evidence",source: { url: "s3://media/keyframes/" },feature_extractors: [{feature: "segmentation",model: "CIDAS/clipseg-rd64-refined",params: {prompts: ["brand logo", "product in hand", "damaged surface"],return_crops: true}}]});
Capabilities
- Text-guided segmentation without a closed class list
- Foreground mask generation for natural images and video keyframes
- Region crop extraction before visual embedding
- Spatial metadata for evidence citations
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| PhraseCut | Segmentation | - | CLIPSeg paper |
| RefCOCO | Referring segmentation | - | CLIPSeg paper |
Performance
Run on selected frames or first-stage candidates when prompt count is high
Specification
Research Paper
Image Segmentation Using Text and Image Prompts
arxiv.orgBuild a pipeline with clipseg-rd64-refined
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio