vjepa2-vitg-fpc64-256

by facebook

Highest-capacity V-JEPA 2 video encoder — self-supervised temporal representations

229Kdl/month

57likes

1.0Bparams

HuggingFace Run on your data, free

Identifiers

Model ID

facebook/vjepa2-vitg-fpc64-256

Feature URI

mixpeek://video_extractor@v1/facebook_vjepa2_vitg_fpc64_256_v1

Overview

V-JEPA 2 (ViT-g) is the largest checkpoint of Meta FAIR's video representation model. What makes the JEPA (Joint-Embedding Predictive Architecture) family different from a masked autoencoder is *where* it predicts: it masks spacetime regions of a clip and predicts the missing regions' **representations in latent space**, not their raw pixels. Skipping pixel reconstruction means the model never spends capacity on texture and lighting detail it doesn't need, so it learns the semantic and dynamic structure of a scene — what moves, how, and in what order — rather than how to repaint it.

The ViT-g variant trades latency for quality: it is the strongest V-JEPA 2 encoder, worth it when representation quality drives your retrieval or classification accuracy more than throughput does. On Mixpeek it serves as a motion-aware video embedding stage — giving an agent a compact vector of what *happens* over a clip, complementary to keyframe/caption features that describe what merely *appears*.

Architecture

Giant Vision Transformer video encoder (ViT-g), the largest V-JEPA 2 checkpoint. The FPC64 variant samples 64 frames and exposes get_vision_features via Transformers; it can also encode a still image by repeating it across the frame dimension. Trained self-supervised by predicting masked spacetime representations in latent space (no pixel decoder), which is the core JEPA distinction from pixel-reconstruction MAEs.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "video_embedding",
    version: "v1",
    parameters: { model_id: "facebook/vjepa2-vitg-fpc64-256" },
  },
});