Video Understanding: From Frames to Contextual Search
Summary
Master video understanding and how it differs from basic image understanding. This video covers frame extraction techniques (sampling, keyframe detection, scene-based), video embedding models that capture temporal context, and building sophisticated semantic video search applications.
About this video
Master video understanding and how it differs from basic image understanding. This video covers frame extraction techniques (sampling, keyframe detection, scene-based), video embedding models that capture temporal context, and building sophisticated semantic video search applications. What you'll learn: ⚡ Video vs image understanding: temporal context matters ⚡ Frame extraction techniques: sampling, keyframe, scene-based ⚡ Frame-level vs video-level embeddings ⚡ How video embeddings capture motion and actions ⚡ Scene detection with AutoShot and semantic deduplication ⚡ Vertex AI multimodal embeddings for video ⚡ Building scene-based video search pipelines ⚡ Real demo: Contextual video retrieval in Mixpeek Studio
