> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Image Extractor

> Dense vector embeddings for images using Google SigLIP (768D) for visual similarity search

<Card title="View on GitHub" icon="github" href="https://github.com/mixpeek/mixpeek-extractors/blob/main/extractors/image_extractor/README.md" horizontal>
  Runnable reference for this extractor — inputs, parameters, output fields, embedding models, and copy-paste examples. Auto-generated from the live registry.
</Card>

<Frame>
  <img src="https://mintcdn.com/mixpeek/TwtTrae3Fi3EFJ72/assets/extractors/image.svg?fit=max&auto=format&n=TwtTrae3Fi3EFJ72&q=85&s=f6bf9fe32cfa5a823a28cb86321fee06" alt="Image extractor pipeline showing SigLIP processing and embedding generation" width="900" height="380" data-path="assets/extractors/image.svg" />
</Frame>

The image extractor generates dense vector embeddings from images using Google's SigLIP model (768D). Optimized for visual similarity search, product matching, and cross-modal search with text queries. Fast (\~50-100ms per image) and cost-effective.

<Note>
  View extractor details at [api.mixpeek.com/v1/collections/features/extractors/image\_extractor\_v1](https://api.mixpeek.com/v1/collections/features/extractors/image_extractor_v1) or fetch programmatically with `GET /v1/collections/features/extractors/{feature_extractor_id}`.
</Note>

## Pipeline Steps

1. **Filter Dataset** (if collection\_id provided)
   * Filter to specified collection
2. **Detect Content Types**
   * Sample 100 rows to identify images vs PDFs
3. **PDF Page Expansion** (conditional: if PDF content detected)
   * Render each PDF page at 72 DPI using PyMuPDF
   * Create separate image for each page
4. **SigLIP Image Embedding Generation**
   * Resize to 224×224 internally
   * GPU-accelerated inference
   * Generate 768D visual embeddings
5. **Thumbnail Generation** (conditional: if `enable_thumbnails=true`)
   * Resize to 640px width at 85% quality
   * Upload to S3 with optional CDN
6. **Output**
   * Image/page documents with embeddings
   * Optional thumbnail URLs

## When to Use

| Use Case               | Description                                              |
| ---------------------- | -------------------------------------------------------- |
| **Image search**       | Find visually similar images in large collections        |
| **Visual similarity**  | Match products, artwork, or content by appearance        |
| **Content discovery**  | Recommend similar visual content                         |
| **Cross-modal search** | Find images using text queries (via SigLIP text encoder) |
| **E-commerce**         | Product image search and visual recommendations          |
| **Stock photo search** | Media library search by visual content                   |

## When NOT to Use

| Scenario                        | Recommended Alternative                 |
| ------------------------------- | --------------------------------------- |
| Face recognition                | `face_identity_extractor`               |
| Video content                   | `multimodal_extractor`                  |
| Text-heavy images requiring OCR | `multimodal_extractor` with OCR enabled |
| Audio content                   | `audio_extractor`                       |

## Input Schema

| Field   | Type   | Required | Description                                                                                                  |
| ------- | ------ | -------- | ------------------------------------------------------------------------------------------------------------ |
| `image` | string | **Yes**  | URL or S3 path to image file. Formats: JPEG, PNG, WebP, BMP. Any resolution (resized to 224x224 internally). |

```json theme={null}
{
  "image": "s3://my-bucket/products/laptop-pro.jpg"
}
```

**Input Examples:**

| Type          | Example                                           |
| ------------- | ------------------------------------------------- |
| Product image | `s3://my-bucket/products/laptop-pro.jpg`          |
| Stock photo   | `https://cdn.example.com/photos/sunset-beach.jpg` |
| Catalog image | `s3://catalog/items/SKU-12345.png`                |

**Supported Formats**: JPEG, PNG, WebP, BMP, GIF (static)
**Recommended Resolution**: 224x224 or larger (automatically resized)
**Max File Size**: 10MB recommended

## Output Schema

| Field                          | Type        | Description                                  |
| ------------------------------ | ----------- | -------------------------------------------- |
| `image_extractor_v1_embedding` | float\[768] | SigLIP image embedding, L2 normalized        |
| `processing_time_ms`           | number      | Processing time in milliseconds              |
| `thumbnail_url`                | string      | S3 URL of the thumbnail image (if generated) |

```json theme={null}
{
  "image_extractor_v1_embedding": [0.023, -0.041, 0.018, ...],
  "processing_time_ms": 85.2,
  "thumbnail_url": "s3://mixpeek-storage/ns_123/thumbnails/thumb_001.jpg"
}
```

## Parameters

The image extractor uses sensible defaults and requires no additional parameters for basic usage.

| Parameter       | Type | Default | Description                           |
| --------------- | ---- | ------- | ------------------------------------- |
| *None required* | -    | -       | All parameters use optimized defaults |

## Configuration Examples

<CodeGroup>
  ```json Basic Image Embedding theme={null}
  {
    "feature_extractor": {
      "feature_extractor_name": "image_extractor",
      "version": "v1",
      "input_mappings": {
        "image": "image_url"
      },
      "field_passthrough": [
        { "source_path": "metadata.product_id" }
      ],
      "parameters": {}
    }
  }
  ```

  ```json E-commerce Product Images theme={null}
  {
    "feature_extractor": {
      "feature_extractor_name": "image_extractor",
      "version": "v1",
      "input_mappings": {
        "image": "product_image"
      },
      "field_passthrough": [
        { "source_path": "metadata.sku" },
        { "source_path": "metadata.category" },
        { "source_path": "metadata.brand" }
      ],
      "parameters": {}
    }
  }
  ```

  ```json Stock Photo Library theme={null}
  {
    "feature_extractor": {
      "feature_extractor_name": "image_extractor",
      "version": "v1",
      "input_mappings": {
        "image": "photo_url"
      },
      "field_passthrough": [
        { "source_path": "metadata.photographer" },
        { "source_path": "metadata.tags" },
        { "source_path": "metadata.license" }
      ],
      "parameters": {}
    }
  }
  ```

  ```json Art Collection theme={null}
  {
    "feature_extractor": {
      "feature_extractor_name": "image_extractor",
      "version": "v1",
      "input_mappings": {
        "image": "artwork_url"
      },
      "field_passthrough": [
        { "source_path": "metadata.artist" },
        { "source_path": "metadata.title" },
        { "source_path": "metadata.year" },
        { "source_path": "metadata.medium" }
      ],
      "parameters": {}
    }
  }
  ```
</CodeGroup>

## Performance & Costs

| Metric               | Value                          |
| -------------------- | ------------------------------ |
| **Processing speed** | \~50-100ms per image           |
| **Batch processing** | Up to 16 images per batch      |
| **GPU acceleration** | Supported for faster inference |
| **Cost**             | 2 credits per image            |

## Vector Index

| Property            | Value                                                |
| ------------------- | ---------------------------------------------------- |
| **Feature URI**     | `mixpeek://image_extractor@v1/google_siglip_base_v1` |
| **Index name**      | `image_extractor_v1_embedding`                       |
| **Dimensions**      | 768                                                  |
| **Type**            | Dense                                                |
| **Distance metric** | Cosine                                               |
| **Datatype**        | float32                                              |
| **Inference model** | `google_siglip_base_v1`                              |

<Note>In retrievers, reference this feature by its **Feature URI** above (the output name is `google_siglip_base_v1`, **not** the index name `image_extractor_v1_embedding`).</Note>

### Cross-Modal Search

The SigLIP embeddings are compatible with SigLIP text embeddings, enabling cross-modal search where you can:

* Find images using natural language text queries
* Match images to text descriptions
* Build hybrid search combining visual and textual similarity

## Comparison with Other Image Extractors

| Feature         | image\_extractor    | multimodal\_extractor           |
| --------------- | ------------------- | ------------------------------- |
| **Dimensions**  | 768                 | 1408                            |
| **Model**       | SigLIP              | Vertex AI Multimodal            |
| **Processing**  | Image only          | Video, Image, Text, GIF         |
| **Cross-modal** | SigLIP text encoder | Vertex text encoder             |
| **Best For**    | Fast image search   | Unified multimodal search       |
| **Cost**        | 2 credits/image     | Higher (includes more features) |

## Limitations

* **Image only**: Does not process video, audio, or text content
* **No OCR**: Cannot extract text from images; use `multimodal_extractor` with OCR
* **No face recognition**: For face matching, use `face_identity_extractor`
* **Single image**: Processes one image at a time (batch via API)
* **Resolution**: Input is resized to 224x224 internally

## Related

* [Feature Extractors Overview](/processing/feature-extractors)
* [Multimodal Extractor](/processing/extractors/multimodal)
* [Face Identity Extractor](/processing/extractors/face-identity)
* [Text Extractor](/processing/extractors/text)
