Feature Extractors
After your data is connected, extractors run in parallel to pull out structured features, embeddings, entities, transcripts, and more.
102 extractors available
Facial Recognition
Detect and identify faces in images with high accuracy
Video Embedding
Generate vector embeddings for video content
Web Scraper
Extract structured data from webpages while maintaining semantic context and relationships
Text Embedding
Extract semantic embeddings from documents, transcripts and text content
Image Embedding
Generate visual embeddings for similarity search and clustering
Audio Embedding
Extract semantic embeddings from audio content for similarity search
Multimodal Extractor
Unified embeddings for video, audio, image, and text — scene/silence chunking, Whisper transcription, thumbnails, and Gemini vision.
Universal Extractor
All-in-one extractor for image, video, audio, and documents — auto-detects modality and applies the right pipeline.
Gemini Multifile Extractor
Embed all files of an object (images, PDFs, video, audio, text) into a single 3072-D Gemini vector.
Document Graph Extractor
Decompose PDFs into spatial blocks — paragraphs, tables, forms, headers — with layout classification and E5 text embeddings.
Passthrough Extractor
Store and canonicalize objects with zero ML — metadata-only ingestion for bucket/object modeling without embeddings.
Scrolling Text Extractor
Read scrolling/marquee video text via phase-correlation band detection, panoramic stitching, and VLM OCR.