Name: Mixpeek for Law Enforcement
Brand: Mixpeek
Availability: InStock

Question 1

How does Mixpeek maintain CJIS compliance?

Accepted Answer

All processing runs entirely within your AWS VPC. Custom extractor containers bundle model weights — the GPU cluster has no internet access. No evidence data is sent to third-party APIs (no Vertex AI, OpenAI, or Anthropic). Built-in text embeddings use E5-Large which runs locally on Ray. Retriever LLM stages route to a self-hosted Qwen2.5-VL instance via a local vLLM endpoint. Full lineage tracking provides chain-of-custody audit trails for every extraction and retrieval.

Question 2

What models are used for transcription?

Accepted Answer

The primary ASR model is NVIDIA Parakeet TDT v3 (600M params, CC-BY-4.0 license) with 6.34% word error rate — better than Whisper's 7.44%. SpeechBrain SepFormer handles noise enhancement for BWC audio with wind, sirens, and radio interference. Speaker diarization uses pyannote 3.1 (MIT license) which supports unlimited speakers and handles overlapping speech. Optional forced alignment uses Qwen3-ForcedAligner for legal-grade word timestamps.

Question 3

How does cross-camera face clustering work?

Accepted Answer

SCRFD detects faces at 2 FPS sampling. AdaFace IR101 (MIT license) generates 512-dimensional embeddings optimized for degraded image quality — it outperforms ArcFace on surveillance benchmarks specifically because it down-weights unrecognizable faces during training. BoT-SORT groups faces into per-video tracks with camera motion compensation. HDBSCAN then clusters track-level embeddings across all cameras with no manual threshold tuning, followed by agglomerative merge of cluster centroids to catch same-person splits across lighting changes.

Question 4

What is the firearms detection pipeline?

Accepted Answer

A 4-tier pipeline: (1) YOLO-World zero-shot screening at 1-5 FPS with open vocabulary prompts for handgun, pistol, rifle, shotgun, firearm, weapon, and gun. (2) BoT-SORT temporal tracking with camera motion compensation — requires 3 detections in 5 frames to trigger, eliminating single-frame false positives from radios or dark phones. (3) Grounding DINO 1.5 verification on tracked detections only (~1% of frames). (4) Optional SAM 2 segmentation for forensic weapon masks. A fine-tuned YOLOv11 on firearms datasets can replace Tier 1 for higher accuracy.

Question 5

How does semantic chapterization differ from scene detection?

Accepted Answer

Traditional scene detection (PySceneDetect) finds visual cuts in edited video — wrong for continuous BWC footage, where it mostly triggers on camera motion and lighting changes. Our chapterization uses ruptures PELT change-point detection on 4 combined signals: SigLIP visual embeddings, audio energy, transcript topic similarity, and optical flow motion classification. This finds semantic event boundaries — foot pursuit begins, confrontation starts, suspect detained — not visual cuts. Each chapter gets a forensic summary from a local Qwen2.5-VL-7B instance.

Question 6

What hardware is required for deployment?

Accepted Answer

Minimum: 1x A100 80GB (runs all models sequentially). Recommended: 2x A100 80GB for parallel ASR + face + weapons pipelines. The vLLM server for retriever LLM stages (Qwen2.5-72B-Instruct) needs 1x A100 80GB. Supporting infrastructure: 4-core/16GB for API + Celery, 8-core/32GB for Qdrant vector storage, managed DocumentDB for metadata, S3 with VPC endpoint for evidence files, and ElastiCache Redis for queuing.

Question 7

How fast does the pipeline process video?

Accepted Answer

On A100 GPU, the full pipeline (all 5 extraction stages) processes approximately 1 hour of BWC footage in 10-15 minutes. In our benchmarks on 35 minutes of footage across 4 cameras, the SOTA v2 pipeline completed in 46 minutes on CPU (M3 Ultra) — 2.1x faster than v1 with dramatically better quality. GPU projection brings this to under 15 minutes for the same footage.

Question 8

Can prosecutors search evidence in natural language?

Accepted Answer

Yes. The evidence-search retriever accepts natural language queries like 'Show me everywhere the suspect appears' or 'When were weapons drawn during the foot pursuit.' The retriever runs a multi-stage pipeline: semantic search over chapter embeddings, incident phase classification via taxonomy, document enrichment with face IDs and firearms events from other collections, temporal sorting, and a final LLM synthesis stage that produces a forensic timeline citing camera IDs and timestamps.

Question 9

What about existing evidence management systems?

Accepted Answer

Mixpeek exposes a REST API that integrates with any evidence management system. Upload BWC files to a Mixpeek bucket with officer, camera, and incident metadata. Processing triggers automatically. Results are queryable via retrievers or accessible via the collections API. Webhooks notify your systems when processing completes. Alerts can fire on specific conditions — e.g., firearms detected in new footage.

Question 10

How does this compare to Twelve Labs or other video AI platforms?

Accepted Answer

Twelve Labs and similar platforms are cloud-only — they require sending video to external servers, which violates CJIS for criminal justice evidence. They also lack specialized pipelines for face clustering across cameras, temporal firearms tracking, and incident phase classification. Mixpeek runs entirely self-hosted with custom extractor containers, purpose-built for the BWC evidence analysis workflow rather than general-purpose video understanding.

Task	Model	License	GPU
ASR	NVIDIA Parakeet TDT v3	CC-BY-4.0	T4+
Speaker Diarization	pyannote 3.1	MIT	T4+
Face Detection	SCRFD	Apache-2.0	CPU
Face Embeddings	AdaFace IR101	MIT	T4+
Firearms Detection	YOLO-World	GPL-3.0	A10+
Firearms Verification	Grounding DINO 1.5	Apache-2.0	A10+
Weapon Segmentation	SAM 2	Apache-2.0	A10+
VLM (Chapters)	Qwen2.5-VL-7B	Apache-2.0	A100
Chapter Boundaries	ruptures PELT	BSD	CPU
Text Embeddings	E5-Large	MIT	T4+

AI-Powered Body-Worn Camera Video Analysis

From Raw BWC Footage to Structured Evidence

Transcription & Speaker Diarization

Face Detection & Cross-Camera Clustering

Firearms Detection & Tracking

How It Works

Ingest BWC Footage

5-Stage Parallel Extraction

Cross-Video Synthesis

Prosecutor Retrieval

Benchmark Results

CJIS Compliant by Design

Zero External API Calls

Air-Gapped Inference

Full Chain of Custody

Open-Source, Licensed Models

Model Stack

Built For

County Prosecutors

Internal Affairs & Use-of-Force Review

Evidence Management Teams

Police Department Leadership

Related Solutions

Security & Surveillance

Legal & Compliance

Frequently Asked Questions