From Images - Mixpeek

Image extraction: image file goes through feature extractor to produce visual embeddings, OCR, descriptions, face embeddings, and structured JSON

Mixpeek extracts visual embeddings, OCR text, descriptions, and structured metadata from images. Each image becomes a document with dense vector indexes for visual similarity search, text-to-image search, and filtered retrieval.

What Gets Extracted

Feature	Model	Dimensions	Extractor
Visual embeddings (image-only)	SigLIP	768D	`image_extractor`
Visual embeddings (cross-modal)	Vertex AI multimodal	1408D	`multimodal_extractor`
OCR text	Gemini	—	`multimodal_extractor`
Image descriptions	Gemini	—	`multimodal_extractor`
Face embeddings	ArcFace (SCRFD detect)	512D	`face_identity_extractor`
Thumbnails	FFmpeg	—	`image_extractor`, `multimodal_extractor`

Choosing an Extractor

Goal	Extractor	Why
Visual similarity search (image-to-image)	`image_extractor`	SigLIP 768D embeddings, fast (~50ms/image), supports cross-modal text queries
Cross-modal search (text-to-image, image-to-video)	`multimodal_extractor`	Vertex AI 1408D unified embedding space across video, image, and text
OCR or image descriptions	`multimodal_extractor`	Gemini-based text extraction and description generation
Face detection and matching	`face_identity_extractor`	ArcFace 512D with 99.8% verification accuracy
Structured extraction (products, labels)	`multimodal_extractor` with `response_shape`	LLM extracts structured JSON from image content

Use image_extractor when you only need image search. Use multimodal_extractor when you need images searchable alongside video or text in the same embedding space.

Create a Collection for Images

This collection generates SigLIP embeddings and thumbnails for an image catalog.

curl -X POST https://api.mixpeek.com/v1/collections \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_name": "product-images",
    "source": { "type": "bucket", "bucket_id": "bkt_products" },
    "feature_extractor": {
      "feature_extractor_name": "image_extractor",
      "version": "v1",
      "input_mappings": {
        "image": "image_url"
      },
      "field_passthrough": [
        { "source_path": "metadata.product_id" },
        { "source_path": "metadata.brand" },
        { "source_path": "metadata.category" }
      ],
      "parameters": {
        "enable_thumbnails": true
      }
    }
  }'

Reverse Image Search

Create a retriever and execute it with a text query. SigLIP’s shared text-image embedding space lets you search images with natural language.

curl -X POST https://api.mixpeek.com/v1/retrievers \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "retriever_name": "image-search",
    "collection_ids": ["col_product_images"],
    "input_schema": {
      "properties": {
        "query": { "type": "text", "required": true }
      }
    },
    "stages": [
      {
        "stage_name": "visual_search",
        "stage_type": "filter",
        "config": {
          "stage_id": "feature_search",
          "parameters": {
            "query": "{{INPUT.query}}",
            "top_k": 20
          }
        }
      }
    ]
  }'

Execute a text-to-image search:

curl -X POST https://api.mixpeek.com/v1/retrievers/ret_abc123/execute \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": { "query": "red leather handbag with gold buckle" },
    "limit": 10
  }'

Structured Extraction from Images

Use multimodal_extractor with response_shape to extract structured product metadata from images.

curl -X POST https://api.mixpeek.com/v1/collections \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_name": "product-catalog-enriched",
    "source": { "type": "bucket", "bucket_id": "bkt_products" },
    "feature_extractor": {
      "feature_extractor_name": "multimodal_extractor",
      "version": "v1",
      "input_mappings": {
        "image": "image_url"
      },
      "parameters": {
        "run_multimodal_embedding": true,
        "run_ocr": true,
        "run_video_description": true,
        "description_prompt": "Describe the product in this image including color, material, and style.",
        "response_shape": {
          "type": "object",
          "properties": {
            "product_type": { "type": "string" },
            "color": { "type": "string" },
            "material": { "type": "string" },
            "brand_visible": { "type": "boolean" },
            "text_on_product": { "type": "string" }
          }
        }
      }
    }
  }'

Output Schema

Each image produces a document like this:

{
  "document_id": "doc_img_456",
  "thumbnail_url": "s3://mixpeek-storage/ns_123/thumbnails/product_001.jpg",
  "metadata": {
    "product_id": "SKU-12345",
    "brand": "Acme",
    "category": "accessories"
  },
  "image_extractor_v1_embedding": [0.045, -0.012, "...768 floats"]
}

When using multimodal_extractor with descriptions and OCR:

{
  "document_id": "doc_img_789",
  "description": "Red leather handbag with gold buckle closure, front pocket with magnetic snap",
  "ocr_text": "ACME LEATHER CO.",
  "product_type": "handbag",
  "color": "red",
  "material": "leather",
  "brand_visible": true,
  "text_on_product": "ACME LEATHER CO.",
  "multimodal_extractor_v1_multimodal_embedding": [0.023, -0.041, "...1408 floats"]
}

Field	Type	Description
`image_extractor_v1_embedding`	float[768]	SigLIP visual embedding
`multimodal_extractor_v1_multimodal_embedding`	float[1408]	Vertex AI cross-modal embedding
`description`	string	Gemini-generated image description
`ocr_text`	string	Text extracted from the image
`thumbnail_url`	string	S3 URL of resized thumbnail (640px width)
`response_shape` fields	varies	Structured fields from LLM extraction

Image Extractor — full parameter reference
Multimodal Extractor — cross-modal embedding and OCR
Face Identity Extractor — face detection in images
Retrievers — build search pipelines over extracted features

Documentation Index

​What Gets Extracted

​Choosing an Extractor

​Create a Collection for Images

​Reverse Image Search

​Structured Extraction from Images

​Output Schema

​Related

What Gets Extracted

Choosing an Extractor

Create a Collection for Images

Reverse Image Search

Structured Extraction from Images

Output Schema

Related