void-model
by netflix
Video object removal that preserves the physical interactions the object caused
netflix/void-modelmixpeek://image_extractor@v1/netflix_void_v1Overview
VOID (Video Object and Interaction Deletion) is Netflix's video inpainting model that removes objects from video while preserving every physical interaction the object caused on the surrounding scene. Unlike conventional inpainting, which only erases pixels, VOID handles second-order effects — falling objects, displaced items, shadows, reflections, and contact responses — so the edited shot looks physically coherent.
VOID is fine-tuned from CogVideoX-Fun-V1.5-5B using a quadmask conditioning scheme that distinguishes the primary object, overlap regions, affected regions, and background. A two-pass pipeline (base inpainting + warped-noise refinement) keeps results temporally stable across long shots.
Architecture
Fine-tuned from CogVideoX-Fun-V1.5-5B (5B-parameter 3D transformer). Quadmask conditioning encodes 4 region classes (remove / overlap / affected / keep). BF16 precision with FP8 quantization, DDIM scheduler, default 384x672 resolution and up to 197 frames. Two-pass inference: base inpainting followed by warped-noise refinement for temporal consistency.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/video.mp4" },feature_extractors: [{name: "segmentation",version: "v1",params: { model_id: "netflix/void-model" }}]});
Capabilities
- Removes objects from video while preserving induced physical interactions
- Handles falling, sliding, and contact-driven secondary motion
- Quadmask-aware conditioning isolates affected regions from background
- Two-pass refinement for temporally consistent long shots
- Pairs with SAM2 + Gemini mask generation pipeline for end-to-end editing
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| Internal Netflix eval | Scene boundary F1 | 94.8% | Netflix Tech Blog, 2024 |
Performance
Specification
Research Paper
VOID: Video Object and Interaction Deletion
arxiv.orgBuild a pipeline with void-model
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Pipeline Builder