All-in-one multimodal extractor — image, video, audio, and documents — producing 3072-d Gemini embeddings plus text descriptions, OCR, and transcription
The universal extractor is an all-in-one feature extractor that handles image, video, audio, and documents through Google’s Gemini APIs. It produces a single 3072-dimensional embedding (Gemini Embedding 2) per object alongside rich text extraction — AI-generated descriptions, OCR for images and documents, and transcription for audio and video. It runs on Celery (not Ray) for zero cluster-startup latency, making it a fast path for mixed-modality corpora.
Detected modality: image, video, audio, or document
text
string | null
Extracted text (OCR, transcription, or document text)
description
string | null
AI-generated description of the content
segment_index
integer | null
Segment index (chunked video/audio/documents)
segment_total
integer | null
Total segments for this source object
page_number
integer | null
Page number (documents only)
start_time_s / end_time_s
float | null
Segment start/end time in seconds (video/audio)
duration_s
float | null
Total file duration in seconds (video/audio)
{ "universal_extractor_v1_embedding": [0.012, -0.034, 0.008, ...], "modality": "document", "text": "Quarterly revenue grew 12% year over year...", "description": "A financial report page with a revenue bar chart", "page_number": 1, "segment_total": 12}
Embedding intent for Gemini Embedding 2. Common values: RETRIEVAL_DOCUMENT, RETRIEVAL_QUERY, SEMANTIC_SIMILARITY
generate_description
boolean
true
—
Generate a text description via Gemini vision/understanding
extract_text
boolean
true
—
Extract text (OCR for images/docs, transcription for audio/video)
max_video_segments
integer
10
1–50
Maximum number of 30s segments to process for video files
max_document_pages
integer
50
1–200
Maximum number of pages to process for document files
max_file_download_mb
integer
500
1–1024
Maximum file download size (MB) for Celery fast-path processing
max_concurrency
integer
4
1–32
Maximum per-task object concurrency for Celery fast-path processing
Dimensions are locked at namespace creation. Switching output_dimensionality on an existing namespace requires a migration since the vector index dimensionality is fixed.