A computer vision task that assigns a label to every pixel in an image, delineating object boundaries precisely. Segmentation enables fine-grained visual understanding in multimodal systems beyond what bounding boxes provide.
Image segmentation models classify each pixel in an image into a category. Semantic segmentation assigns class labels to all pixels, instance segmentation distinguishes individual object instances, and panoptic segmentation combines both. Models use encoder-decoder architectures where the encoder extracts features and the decoder upsamples to produce pixel-level predictions.
Modern architectures include Mask R-CNN for instance segmentation, Segment Anything (SAM) for promptable segmentation, and SegFormer for efficient semantic segmentation. SAM introduced a foundation model approach where a single model handles arbitrary segmentation tasks via point, box, or text prompts. Output masks are typically stored as run-length encoded binary arrays for efficiency.
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS