Documentation Index
Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
What Gets Extracted
| Feature | Model | Dimensions | Extractor |
|---|---|---|---|
| Visual embeddings (image-only) | SigLIP | 768D | image_extractor |
| Visual embeddings (cross-modal) | Vertex AI multimodal | 1408D | multimodal_extractor |
| OCR text | Gemini | — | multimodal_extractor |
| Image descriptions | Gemini | — | multimodal_extractor |
| Face embeddings | ArcFace (SCRFD detect) | 512D | face_identity_extractor |
| Thumbnails | FFmpeg | — | image_extractor, multimodal_extractor |
Choosing an Extractor
| Goal | Extractor | Why |
|---|---|---|
| Visual similarity search (image-to-image) | image_extractor | SigLIP 768D embeddings, fast (~50ms/image), supports cross-modal text queries |
| Cross-modal search (text-to-image, image-to-video) | multimodal_extractor | Vertex AI 1408D unified embedding space across video, image, and text |
| OCR or image descriptions | multimodal_extractor | Gemini-based text extraction and description generation |
| Face detection and matching | face_identity_extractor | ArcFace 512D with 99.8% verification accuracy |
| Structured extraction (products, labels) | multimodal_extractor with response_shape | LLM extracts structured JSON from image content |
Use
image_extractor when you only need image search. Use multimodal_extractor when you need images searchable alongside video or text in the same embedding space.Create a Collection for Images
This collection generates SigLIP embeddings and thumbnails for an image catalog.Reverse Image Search
Create a retriever and execute it with a text query. SigLIP’s shared text-image embedding space lets you search images with natural language.Structured Extraction from Images
Usemultimodal_extractor with response_shape to extract structured product metadata from images.
Output Schema
Each image produces a document like this:multimodal_extractor with descriptions and OCR:
| Field | Type | Description |
|---|---|---|
image_extractor_v1_embedding | float[768] | SigLIP visual embedding |
multimodal_extractor_v1_multimodal_embedding | float[1408] | Vertex AI cross-modal embedding |
description | string | Gemini-generated image description |
ocr_text | string | Text extracted from the image |
thumbnail_url | string | S3 URL of resized thumbnail (640px width) |
response_shape fields | varies | Structured fields from LLM extraction |
Related
- Image Extractor — full parameter reference
- Multimodal Extractor — cross-modal embedding and OCR
- Face Identity Extractor — face detection in images
- Retrievers — build search pipelines over extracted features

