void-model

by netflix

Video object removal that preserves the physical interactions the object caused

N/Adl/month

956likes

5Bparams

HuggingFace Run on your data, free

Identifiers

Model ID

netflix/void-model

Feature URI

mixpeek://image_extractor@v1/netflix_void_v1

Overview

VOID (Video Object and Interaction Deletion) is Netflix's video inpainting model that removes objects from video while preserving every physical interaction the object caused on the surrounding scene. Unlike conventional inpainting, which only erases pixels, VOID handles second-order effects — falling objects, displaced items, shadows, reflections, and contact responses — so the edited shot looks physically coherent.

VOID is fine-tuned from CogVideoX-Fun-V1.5-5B using a quadmask conditioning scheme that distinguishes the primary object, overlap regions, affected regions, and background. A two-pass pipeline (base inpainting + warped-noise refinement) keeps results temporally stable across long shots.

Architecture

Fine-tuned from CogVideoX-Fun-V1.5-5B (5B-parameter 3D transformer). Quadmask conditioning encodes 4 region classes (remove / overlap / affected / keep). BF16 precision with FP8 quantization, DDIM scheduler, default 384x672 resolution and up to 197 frames. Two-pass inference: base inpainting followed by warped-noise refinement for temporal consistency.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "segmentation",
    version: "v1",
    parameters: { model_id: "netflix/void-model" },
  },
});