Visual Intelligence

Image Search API

Part of the multimodal data warehouse. Images are decomposed into embeddings, detected objects, OCR text, and visual features stored across cost tiers. Compose retriever pipelines with visual search, reverse image search, and recognition stages on top.

Image Search Capabilities

From visual similarity to natural language queries, build any image search experience with a single API.

Visual Similarity Search

Find visually similar images by analyzing content, color, composition, and semantic meaning.

Content-based image retrieval using deep embeddings
Configurable similarity thresholds and ranking
Works across different resolutions and aspect ratios

Reverse Image Search

Upload an image and find exact matches, near-duplicates, or derivative versions across your entire corpus.

Detect duplicates and near-duplicates at scale
Find cropped, resized, or modified versions
Source attribution and provenance tracking

Object & Scene Recognition

Detect and classify objects, scenes, and actions within images for intelligent categorization.

Identify thousands of object categories
Scene classification and context understanding
Action and activity recognition in images

Text-to-Image Search

Describe what you are looking for in natural language and retrieve matching images instantly.

Natural language queries mapped to visual content
Semantic understanding beyond keyword matching
Multi-language query support via CLIP models

How Image Search Works

From raw images to instant retrieval in five steps. Mixpeek handles the entire pipeline so you can focus on building your application.

Upload Images

Ingest images from object storage (S3, GCS, Azure Blob), URLs, or direct upload via the API.

Feature Extraction

Extract visual features using CLIP, SigLIP, or custom models. Generate dense vector embeddings that capture semantic meaning.

Vector Embedding

Transform extracted features into high-dimensional vectors optimized for similarity search.

Indexing

Index embeddings in Qdrant for sub-millisecond approximate nearest neighbor retrieval at any scale.

Retrieval

Query with images, text, or both. Combine semantic search with metadata filtering for precise results.

Search Types

Multiple ways to query your image corpus, each optimized for different use cases and input modalities.

Image-to-Image

Find visually similar images by providing a reference image as the query input.

Text-to-Image

Search your image corpus using natural language descriptions of what you need.

Reverse Image Search

Upload an image to find its source, duplicates, or derivative versions in your collection.

Object Detection Search

Find images containing specific objects, regardless of position, scale, or background.

Combined Search

Combine image queries with text descriptions and metadata filters for multi-signal retrieval.

Batch Processing

Process and index millions of images with distributed pipelines on Ray GPU clusters.

Industry Applications

Image search powers critical workflows across industries, from retail product discovery to industrial quality assurance.

E-commerce Product Discovery

Let shoppers search by photo, find similar products, and discover visual alternatives.

Learn more

Digital Asset Management

Organize, deduplicate, and search large image libraries with visual intelligence.

Learn more

Content Moderation

Detect duplicate uploads, flag inappropriate content, and enforce brand guidelines at scale.

Learn more

Manufacturing Quality Control

Identify visual defects, compare against reference images, and automate inspection workflows.

Learn more

Any Model, Any Architecture

Use Mixpeek's builtin image extractors or bring your own models. With 50+ extractors and a flexible plugin system, you have full control over how images are processed and embedded.

CLIP, SigLIP, and ResNet extractors included out of the box
Deploy custom PyTorch or ONNX models via the plugin system
Chain multiple extractors for multi-feature embeddings
Fine-tune models on your domain data for higher accuracy
GPU-accelerated inference on Ray clusters

CLIP (ViT-B/32)Builtin

General-purpose visual-language embeddings

SigLIPBuiltin

Improved contrastive learning for image-text alignment

ResNet-50Builtin

Classic CNN features for visual similarity

Custom PluginBYOM

Your own model deployed as a Mixpeek plugin

Simple to Integrate

A few lines of code to search your image corpus. Use the Python SDK, JavaScript SDK, or REST API directly.

Python

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

# Search by image URL
results = client.retrievers.search(
    retriever_id="img-search-retriever",
    queries=[
        {
            "type": "image",
            "value": "https://example.com/query-image.jpg",
            "embedding_model": "mixpeek/clip-base"
        }
    ],
    filters={
        "AND": [
            {"key": "category", "value": "product", "operator": "eq"}
        ]
    },
    top_k=20
)

for result in results:
    print(f"Score: {result.score}, ID: {result.document_id}")

Frequently Asked Questions

What is an image search API?

An image search API allows developers to build applications that search, compare, and retrieve images programmatically. Instead of relying on filenames or manual tags, it uses computer vision and vector embeddings to understand visual content, enabling searches by image similarity, natural language descriptions, or a combination of both.

How does visual similarity search work?

Visual similarity search works by converting images into high-dimensional vector embeddings using deep learning models like CLIP or SigLIP. These embeddings capture semantic features such as objects, colors, textures, and composition. When you query with an image, the system finds vectors closest to it in the embedding space using approximate nearest neighbor search, returning the most visually similar results.

What is reverse image search?

Reverse image search takes an image as input and finds matching or similar images in a database. Unlike text-based search, it analyzes the visual content directly to find exact duplicates, near-duplicates (cropped, resized, or filtered versions), and semantically similar images. Common use cases include source attribution, duplicate detection, and copyright enforcement.

What image formats does Mixpeek support?

Mixpeek supports all major image formats including JPEG, PNG, WebP, TIFF, BMP, and GIF (first frame). Images are automatically preprocessed and normalized during ingestion, so you do not need to convert formats before uploading. The API also handles varying resolutions and aspect ratios without manual resizing.

Can I use custom embedding models for image search?

Yes. Mixpeek supports bringing your own models via the plugin system. You can deploy custom PyTorch or ONNX models alongside builtin extractors like CLIP, SigLIP, and ResNet. Custom models run on the same GPU infrastructure and integrate directly into the feature extraction pipeline with no additional setup.

How does image search scale to millions of images?

Mixpeek uses distributed processing on Ray GPU clusters for feature extraction and Qdrant for vector indexing. Qdrant provides sub-millisecond approximate nearest neighbor search even at billions of vectors. Ingestion pipelines automatically scale horizontally, and storage tiering moves cold data to S3 while keeping hot vectors in memory for fast retrieval.

What is the difference between image search and image recognition?

Image recognition identifies and classifies objects, scenes, or attributes within a single image (e.g., 'this image contains a cat'). Image search uses those recognized features to find similar or matching images across a corpus. Mixpeek combines both: recognition happens during feature extraction, and the resulting embeddings power search and retrieval.

Can I combine image search with text and metadata filters?

Yes. Mixpeek retrievers support multi-stage pipelines that combine vector similarity search with metadata filtering. You can search by image similarity and then filter results by date, category, tags, or any custom metadata field. You can also combine image and text queries in a single search for more precise results.

Build Visual Search Applications Today

Start building image search, reverse image search, and visual recognition applications with Mixpeek's unified API.