Feature Extractors
Configurable ETL pipelines that extract structured data from multimodal content, specific to your use case. They are then paired with retrievers to create multimodal search pipelines.
Learn more in docsActivity Grouping
Detect, categorize, and group activities in video content
Face Grouping
Detect, track, and group faces across video frames
Facial Recognition
Detect and identify faces in images with high accuracy
Late Interaction Ranker
Ranks a list of documents against a query using late interaction models (e.g., ColBERT). Produces relevance scores.
Object Detection
Identify and locate objects within images with bounding boxes
Object Grouping
Segment and group objects across video frames
PII Redactor
Detect and redact personally identifiable information from text, transcripts, and OCR output
Seamless Expressive Translation
Translate speech across languages while preserving emotional tone, pauses, and vocal style
Video Embedding
Generate vector embeddings for video content
XceptionNet Deepfake Detector
Detects manipulated facial regions using a CNN trained on the FaceForensics++ dataset.
Accent & Dialect Identification
Identify accents and regional speech patterns
Acoustic Scene Classification
Identify the environment where audio was recorded