Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Image extraction: image file goes through feature extractor to produce visual embeddings, OCR, descriptions, face embeddings, and structured JSON
Mixpeek extracts visual embeddings, OCR text, descriptions, and structured metadata from images. Each image becomes a document with dense vector indexes for visual similarity search, text-to-image search, and filtered retrieval.

What Gets Extracted

FeatureModelDimensionsExtractor
Visual embeddings (image-only)SigLIP768Dimage_extractor
Visual embeddings (cross-modal)Vertex AI multimodal1408Dmultimodal_extractor
OCR textGeminimultimodal_extractor
Image descriptionsGeminimultimodal_extractor
Face embeddingsArcFace (SCRFD detect)512Dface_identity_extractor
ThumbnailsFFmpegimage_extractor, multimodal_extractor

Choosing an Extractor

GoalExtractorWhy
Visual similarity search (image-to-image)image_extractorSigLIP 768D embeddings, fast (~50ms/image), supports cross-modal text queries
Cross-modal search (text-to-image, image-to-video)multimodal_extractorVertex AI 1408D unified embedding space across video, image, and text
OCR or image descriptionsmultimodal_extractorGemini-based text extraction and description generation
Face detection and matchingface_identity_extractorArcFace 512D with 99.8% verification accuracy
Structured extraction (products, labels)multimodal_extractor with response_shapeLLM extracts structured JSON from image content
Use image_extractor when you only need image search. Use multimodal_extractor when you need images searchable alongside video or text in the same embedding space.

Create a Collection for Images

This collection generates SigLIP embeddings and thumbnails for an image catalog.
curl -X POST https://api.mixpeek.com/v1/collections \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_name": "product-images",
    "source": { "type": "bucket", "bucket_id": "bkt_products" },
    "feature_extractor": {
      "feature_extractor_name": "image_extractor",
      "version": "v1",
      "input_mappings": {
        "image": "payload.image_url"
      },
      "field_passthrough": [
        { "source_path": "metadata.product_id" },
        { "source_path": "metadata.brand" },
        { "source_path": "metadata.category" }
      ],
      "parameters": {
        "enable_thumbnails": true
      }
    }
  }'
Create a retriever and execute it with a text query. SigLIP’s shared text-image embedding space lets you search images with natural language.
curl -X POST https://api.mixpeek.com/v1/retrievers \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "retriever_name": "image-search",
    "collection_ids": ["col_product_images"],
    "input_schema": {
      "properties": {
        "query": { "type": "text", "required": true }
      }
    },
    "stages": [
      {
        "stage_name": "visual_search",
        "stage_type": "filter",
        "config": {
          "stage_id": "feature_search",
          "parameters": {
            "query": "{{INPUT.query}}",
            "top_k": 20
          }
        }
      }
    ]
  }'
Execute a text-to-image search:
curl -X POST https://api.mixpeek.com/v1/retrievers/ret_abc123/execute \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": { "query": "red leather handbag with gold buckle" },
    "limit": 10
  }'

Structured Extraction from Images

Use multimodal_extractor with response_shape to extract structured product metadata from images.
curl -X POST https://api.mixpeek.com/v1/collections \
  -H "Authorization: Bearer $MIXPEEK_API_KEY" \
  -H "X-Namespace: $NAMESPACE" \
  -H "Content-Type: application/json" \
  -d '{
    "collection_name": "product-catalog-enriched",
    "source": { "type": "bucket", "bucket_id": "bkt_products" },
    "feature_extractor": {
      "feature_extractor_name": "multimodal_extractor",
      "version": "v1",
      "input_mappings": {
        "image": "payload.image_url"
      },
      "parameters": {
        "run_multimodal_embedding": true,
        "run_ocr": true,
        "run_video_description": true,
        "description_prompt": "Describe the product in this image including color, material, and style.",
        "response_shape": {
          "type": "object",
          "properties": {
            "product_type": { "type": "string" },
            "color": { "type": "string" },
            "material": { "type": "string" },
            "brand_visible": { "type": "boolean" },
            "text_on_product": { "type": "string" }
          }
        }
      }
    }
  }'

Output Schema

Each image produces a document like this:
{
  "document_id": "doc_img_456",
  "thumbnail_url": "s3://mixpeek-storage/ns_123/thumbnails/product_001.jpg",
  "metadata": {
    "product_id": "SKU-12345",
    "brand": "Acme",
    "category": "accessories"
  },
  "image_extractor_v1_embedding": [0.045, -0.012, "...768 floats"]
}
When using multimodal_extractor with descriptions and OCR:
{
  "document_id": "doc_img_789",
  "description": "Red leather handbag with gold buckle closure, front pocket with magnetic snap",
  "ocr_text": "ACME LEATHER CO.",
  "product_type": "handbag",
  "color": "red",
  "material": "leather",
  "brand_visible": true,
  "text_on_product": "ACME LEATHER CO.",
  "multimodal_extractor_v1_multimodal_embedding": [0.023, -0.041, "...1408 floats"]
}
FieldTypeDescription
image_extractor_v1_embeddingfloat[768]SigLIP visual embedding
multimodal_extractor_v1_multimodal_embeddingfloat[1408]Vertex AI cross-modal embedding
descriptionstringGemini-generated image description
ocr_textstringText extracted from the image
thumbnail_urlstringS3 URL of resized thumbnail (640px width)
response_shape fieldsvariesStructured fields from LLM extraction