Skip to main content
The universal extractor is an all-in-one feature extractor that handles image, video, audio, and documents through Google’s Gemini APIs. It produces a single 3072-dimensional embedding (Gemini Embedding 2) per object alongside rich text extraction — AI-generated descriptions, OCR for images and documents, and transcription for audio and video. It runs on Celery (not Ray) for zero cluster-startup latency, making it a fast path for mixed-modality corpora.
View extractor details at api.mixpeek.com/v1/collections/features/extractors/universal_extractor_v1 or fetch programmatically with GET /v1/collections/features/extractors/{feature_extractor_id}.

Pipeline Steps

  1. Resolve input — apply input_mappings to get the file URL/path from the source object (content field).
  2. Detect modality — classify the object as image, video, audio, or document.
  3. Segment (if needed) — video is processed in up to max_video_segments 30s segments; documents up to max_document_pages pages.
  4. Gemini embedding — generate a 3072-d Gemini Embedding 2 vector (output_dimensionality configurable 256–3072).
  5. Text extraction (if extract_text) — OCR for images/documents, transcription for audio/video.
  6. Description (if generate_description) — Gemini vision/understanding produces a natural-language description.
  7. Output — one document per object (or per segment/page for chunked content).

When to Use

Use CaseDescription
Mixed-modality corporaA single bucket containing images, video, audio, and PDFs you want searchable with one extractor
Fast onboardingCelery fast-path avoids Ray cluster startup, so small batches return quickly
Cross-modal searchOne shared 3072-d embedding space across all four modalities
Rich metadataNeed descriptions, OCR text, and transcription alongside the vector

When NOT to Use

ScenarioRecommended Alternative
High-volume single-modality at lowest costModality-specific extractor (text_extractor, image_extractor)
Audio fingerprinting / sound-mark matchingaudio_fingerprint_extractor
Spatial/layout document analysisdocument_graph_extractor
Self-hosted, no external API callstext_extractor / image_extractor

Input Schema

FieldTypeRequiredDescription
contentstringYesURL or path to the file to process. Populated from input_mappings.
{
  "content": "s3://my-bucket/assets/report.pdf"
}
Supported input types: IMAGE, VIDEO, AUDIO, PDF, TEXT, STRING.

Output Schema

FieldTypeDescription
universal_extractor_v1_embeddingfloat[3072]Gemini Embedding 2 vector for the content
modalitystringDetected modality: image, video, audio, or document
textstring | nullExtracted text (OCR, transcription, or document text)
descriptionstring | nullAI-generated description of the content
segment_indexinteger | nullSegment index (chunked video/audio/documents)
segment_totalinteger | nullTotal segments for this source object
page_numberinteger | nullPage number (documents only)
start_time_s / end_time_sfloat | nullSegment start/end time in seconds (video/audio)
duration_sfloat | nullTotal file duration in seconds (video/audio)
{
  "universal_extractor_v1_embedding": [0.012, -0.034, 0.008, ...],
  "modality": "document",
  "text": "Quarterly revenue grew 12% year over year...",
  "description": "A financial report page with a revenue bar chart",
  "page_number": 1,
  "segment_total": 12
}

Parameters

ParameterTypeDefaultRangeDescription
output_dimensionalityinteger3072256–3072Output embedding dimensions (Gemini Embedding 2 supports 256–3072)
task_typestring"RETRIEVAL_DOCUMENT"Embedding intent for Gemini Embedding 2. Common values: RETRIEVAL_DOCUMENT, RETRIEVAL_QUERY, SEMANTIC_SIMILARITY
generate_descriptionbooleantrueGenerate a text description via Gemini vision/understanding
extract_textbooleantrueExtract text (OCR for images/docs, transcription for audio/video)
max_video_segmentsinteger101–50Maximum number of 30s segments to process for video files
max_document_pagesinteger501–200Maximum number of pages to process for document files
max_file_download_mbinteger5001–1024Maximum file download size (MB) for Celery fast-path processing
max_concurrencyinteger41–32Maximum per-task object concurrency for Celery fast-path processing
Dimensions are locked at namespace creation. Switching output_dimensionality on an existing namespace requires a migration since the vector index dimensionality is fixed.

Configuration Examples

{
  "feature_extractor": {
    "feature_extractor_name": "universal_extractor",
    "version": "v1",
    "input_mappings": {
      "content": "file_url"
    },
    "parameters": {}
  }
}

Performance & Costs

MetricValue
ComputeCelery fast-path (no Ray cluster startup)
Cost15 credits per object (covers all Gemini API calls: embedding, description, OCR/transcription)
External APIGoogle Gemini (embedding + vision/understanding)
Max download500 MB per object (configurable to 1024 MB)

Vector Index

PropertyValue
Index nameuniversal_extractor_v1_embedding
Dimensions3072 (configurable 256–3072)
TypeDense
Distance metricCosine
Inference modelgoogle/gemini-embedding-2

Limitations

  • External dependency: Requires Google Gemini API availability; subject to its rate limits.
  • Per-object cost: Higher per-object cost than self-hosted single-modality extractors.
  • Segment/page caps: Video beyond max_video_segments and documents beyond max_document_pages are truncated.
  • Download ceiling: Files larger than max_file_download_mb are skipped on the Celery fast-path.