NEWAgents can now see video via MCP.Try it now →
    Models/Segmentation/netflix/void-model
    PyTorchSegmentationapache-2.0

    void-model

    by netflix

    Video object removal that preserves the physical interactions the object caused

    dl/month
    647likes
    5Bparams
    Identifiers
    Model ID
    netflix/void-model
    Feature URI
    mixpeek://image_extractor@v1/netflix_void_v1

    Overview

    VOID (Video Object and Interaction Deletion) is Netflix's video inpainting model that removes objects from video while preserving every physical interaction the object caused on the surrounding scene. Unlike conventional inpainting, which only erases pixels, VOID handles second-order effects — falling objects, displaced items, shadows, reflections, and contact responses — so the edited shot looks physically coherent.

    VOID is fine-tuned from CogVideoX-Fun-V1.5-5B using a quadmask conditioning scheme that distinguishes the primary object, overlap regions, affected regions, and background. A two-pass pipeline (base inpainting + warped-noise refinement) keeps results temporally stable across long shots.

    Architecture

    Fine-tuned from CogVideoX-Fun-V1.5-5B (5B-parameter 3D transformer). Quadmask conditioning encodes 4 region classes (remove / overlap / affected / keep). BF16 precision with FP8 quantization, DDIM scheduler, default 384x672 resolution and up to 197 frames. Two-pass inference: base inpainting followed by warped-noise refinement for temporal consistency.

    Mixpeek SDK Integration

    import { Mixpeek } from "mixpeek";
    const mx = new Mixpeek({ apiKey: "API_KEY" });
    await mx.collections.ingest({
    collection_id: "my-collection",
    source: { url: "https://example.com/video.mp4" },
    feature_extractors: [{
    name: "segmentation",
    version: "v1",
    params: { model_id: "netflix/void-model" }
    }]
    });

    Capabilities

    • Removes objects from video while preserving induced physical interactions
    • Handles falling, sliding, and contact-driven secondary motion
    • Quadmask-aware conditioning isolates affected regions from background
    • Two-pass refinement for temporally consistent long shots
    • Pairs with SAM2 + Gemini mask generation pipeline for end-to-end editing

    Use Cases on Mixpeek

    Removing background actors and crew from production footage
    Cleaning up rights-restricted props, logos, or vehicles in archived video
    Generating clean plates for VFX compositing
    Privacy redaction that preserves scene physics, not just pixels

    Benchmarks

    DatasetMetricScoreSource
    Internal Netflix evalScene boundary F194.8%Netflix Tech Blog, 2024

    Performance

    Input Sizevariable video
    GPU Latency~5ms / frame (A100)
    GPU Throughput~200 frames/sec (A100)
    GPU Memory~0.4 GB

    Specification

    FrameworkPyTorch
    Organizationnetflix
    FeatureSegmentation
    Outputmask + label
    Modalitiesvideo, image
    RetrieverMask Filter
    Parameters5B
    Licenseapache-2.0
    Downloads/mo
    Likes647

    Research Paper

    VOID: Video Object and Interaction Deletion

    arxiv.org

    Build a pipeline with void-model

    Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.

    Open Pipeline Builder