NEWAgents can now see video via MCP.Try it now →
    Video Intelligence

    Video Search API

    Part of the multimodal data warehouse -- decompose video into scenes, faces, speech, and visual elements, then retrieve with precision. Find exact moments, detect objects, recognize faces, and extract intelligence from video at scale using natural language.

    Video Search Capabilities

    Go beyond metadata. Search the actual content of your videos with multi-modal AI that understands visual, audio, and textual information.

    Semantic Video Search

    Search by meaning, not keywords. Describe what you're looking for in natural language and find the exact moments that match, even without metadata or tags.

    Visual Object Detection

    Detect and search for objects, people, scenes, and visual elements across your entire video library. Identify brands, products, and specific visual patterns.

    Speech & Audio Search

    Search spoken words, dialogue, and audio events within videos. Transcribe and index audio tracks for full-text search across all spoken content.

    Scene-Level Understanding

    Go beyond individual frames. Understand context, actions, and relationships within scenes to find complex moments like 'a person opening a package' or 'a crowd cheering'.

    How Video Search Works

    From raw video to searchable intelligence in five steps. Mixpeek handles the entire pipeline so you can focus on building your application.

    1

    Upload Video

    Ingest video files in any format via API, SDK, or direct storage connection. Supports MP4, MOV, AVI, WebM, and more.

    2

    Frame Extraction

    Automatically extract frames at configurable FPS intervals. Intelligent keyframe detection skips redundant frames to reduce processing cost.

    3

    Feature Extraction

    Extract visual embeddings, audio transcriptions, text overlays (OCR), and object detections from each frame and audio segment.

    4

    Multi-Vector Indexing

    Index all extracted features as multi-vector representations, enabling cross-modal search across visual, audio, and text dimensions simultaneously.

    5

    Semantic Retrieval

    Query your indexed video content with natural language, images, or audio clips. Get ranked results with precise timestamps and confidence scores.

    What You Can Search

    Multiple search modalities let you find exactly what you need, whether you're searching with text, images, audio, or a combination.

    Text-to-Video

    Search with natural language queries like 'find product demos showing the checkout flow' and get timestamped results.

    Image-to-Video

    Upload a reference image to find visually similar scenes, objects, or people across your video library.

    Audio-to-Video

    Find moments matching a specific sound, voice, or audio pattern. Search by audio clip or spoken phrase.

    Metadata Search

    Filter by duration, format, resolution, tags, and custom metadata. Combine with semantic search for precision.

    Temporal Search

    Find specific timestamps or time ranges within videos. Search for events that occur at particular moments or in sequence.

    Combined Queries

    Combine text, image, audio, and metadata filters in a single multi-modal query for the most accurate results.

    Mixpeek vs. Twelve Labs vs. Google Video Intelligence

    See how Mixpeek compares to other video search and intelligence platforms across key capabilities.

    FeatureMixpeekTwelve LabsGoogle Video Intelligence
    Search ModalitiesText, image, audio, video, combinedText, imageText, labels
    Custom ModelsBring your own modelsLimited fine-tuningPre-trained only
    Self-Hosted OptionYes (BYO Cloud)NoNo
    Batch ProcessingAsync batches with webhooksAPI onlyAPI only
    Real-Time SearchSub-second retrievalStandard latencyStandard latency
    Open Source ComponentsYes (extractors, SDKs)NoNo
    Pricing ModelUsage-based, transparentPer-minute pricingPer-minute + per-feature

    Search Video in a Few Lines of Code

    Use the Mixpeek Python SDK to search your video content with natural language. Filter by metadata, specify modalities, and get timestamped results.

    • Natural language queries
    • Timestamped results with preview URLs
    • Metadata filtering and faceted search
    • Multi-modal query support
    • Confidence scoring and ranking
    video_search.py
    from mixpeek import Mixpeek
    
    client = Mixpeek(api_key="YOUR_API_KEY")
    
    # Search video content with natural language
    results = client.retrievers.search(
        retriever_id="video-search-retriever",
        queries=[
            {
                "type": "text",
                "value": "person demonstrating the product features",
                "modality": "video"
            }
        ],
        filters={
            "metadata.duration_seconds": {"$gte": 30},
            "metadata.format": "mp4"
        },
        top_k=10
    )
    
    for result in results:
        print(f"Video: {result.document_id}")
        print(f"Timestamp: {result.start_time}s - {result.end_time}s")
        print(f"Score: {result.score}")
        print(f"Preview: {result.preview_url}")

    Frequently Asked Questions

    What is video search?

    Video search is the ability to find specific moments, objects, scenes, or spoken content within video files using queries. Unlike traditional video search that relies on titles and metadata, semantic video search understands the actual content of the video -- visual elements, audio, text overlays, and context -- enabling natural language queries like 'find the scene where someone demonstrates the product'.

    How does semantic video search work?

    Semantic video search works by extracting multi-modal features from video content: visual embeddings from frames, transcriptions from audio, OCR from text overlays, and object detections from scenes. These features are indexed as multi-dimensional vectors. When you search, your query is converted to the same vector space, and the system finds the closest matching moments using approximate nearest neighbor algorithms.

    What video formats does Mixpeek support?

    Mixpeek supports all major video formats including MP4, MOV, AVI, WebM, MKV, FLV, and WMV. Videos are automatically transcoded during ingestion, so you can upload in any format without preprocessing. We support resolutions up to 4K and videos of any duration.

    Can I search for specific moments in a video?

    Yes. Mixpeek returns timestamped results with precise start and end times for every match. You can search for specific visual moments ('the red car turning left'), spoken phrases ('when they mention pricing'), or combinations of both. Results include confidence scores and frame-level previews.

    How does video search handle long-form content?

    Long-form content is processed using configurable frame extraction intervals and intelligent scene detection. Rather than analyzing every frame, Mixpeek identifies keyframes and scene transitions to create an efficient index. This means a 2-hour video can be fully indexed and searchable without processing millions of redundant frames.

    What is the difference between video search and video intelligence?

    Video search focuses on finding and retrieving specific content within videos based on queries. Video intelligence is broader -- it includes search but also encompasses content understanding, automated tagging, anomaly detection, content moderation, and analytics. Mixpeek provides both capabilities through its feature extraction and retrieval pipeline.

    Can I use custom models for video feature extraction?

    Yes. Mixpeek supports bring-your-own-model (BYOM) for feature extraction. You can deploy custom visual models, audio models, or embedding models alongside Mixpeek's default extractors. This is useful for domain-specific recognition tasks like medical imaging, manufacturing inspection, or branded content detection.

    Is video search available for on-premise deployment?

    Yes. Mixpeek offers BYO Cloud deployment where the entire video search pipeline runs within your own infrastructure (AWS, GCP, or Azure VPC). This ensures your video data never leaves your environment, meeting strict compliance and data residency requirements. See our deployment options page for details.

    Start Searching Video Content Today

    Build powerful video search and intelligence applications with Mixpeek's API. Get started with our free tier or talk to our team about enterprise needs.