Skip to main content
The scrolling text extractor recovers scrolling or marquee text from video — tickers, lower-third banners, end credits, and legal disclaimers — that no single frame ever shows in full. It detects scrolling bands via phase correlation, stitches frames panorama-style to reconstruct the complete text, then OCRs the panorama with a vision language model (Gemini). Output is payload-only (no vector); pair it with text_extractor if you need semantic search over the recovered text.
View extractor details at api.mixpeek.com/v1/collections/features/extractors/scrolling_text_extractor_v1 or fetch programmatically with GET /v1/collections/features/extractors/{feature_extractor_id}.

Pipeline Steps

  1. Sample frames — extract frames at fps frames per second.
  2. Phase correlation — scan strip_height-pixel strips to measure per-frame pixel shift and detect motion.
  3. Classify bands — a band counts as scrolling when shift exceeds min_shift_px and at least consistency_ratio of frame pairs agree.
  4. Crop — crop each detected band with pad pixels of padding above and below.
  5. Stitch — reconstruct the full scrolling content as a panorama image per band.
  6. VLM OCR — read the panorama with a vision language model (Gemini).
  7. Output — combined text plus per-band metadata (axis, direction, shift).

When to Use

Use CaseDescription
News tickersRecover the full crawl text from a horizontally scrolling ticker
End creditsCapture vertically scrolling credit rolls
Compliance disclaimersExtract fast-scrolling legal/disclaimer banners for audit
Sports/finance bannersRead scrolling score or price strips

When NOT to Use

ScenarioRecommended Alternative
Static on-screen text / captionstext_extractor on transcription, or a frame OCR extractor
Spoken-word transcriptionA transcription/audio extractor
Semantic search over recovered textChain text_extractor on the scrolling_text field
Non-video inputsThis extractor is video-only

Input Schema

FieldTypeRequiredDescription
videostringYesURL or path to the video file. Populated from input_mappings.
{
  "video": "s3://my-bucket/clips/newscast.mp4"
}
Supported input types: VIDEO.

Output Schema

FieldTypeDescription
scrolling_textstring | nullCombined, deduplicated text from all detected scrolling bands
scroll_bandsobject[] | nullPer-band details: axis, direction, shift_per_frame, text
bands_detectedinteger | nullNumber of scrolling text bands detected in the video
{
  "scrolling_text": "BREAKING: Markets rally as inflation cools ...",
  "bands_detected": 1,
  "scroll_bands": [
    {
      "axis": "horizontal",
      "direction": "right_to_left",
      "shift_per_frame": 6.4,
      "text": "BREAKING: Markets rally as inflation cools ..."
    }
  ]
}

Parameters

ParameterTypeDefaultRangeDescription
fpsfloat5.01.0–30.0Frame sampling rate. Higher values improve detection for fast-scrolling text but increase processing time
strip_heightinteger4010–200Height (px) of each scanning strip used for phase correlation. Should roughly match the scrolling text band height
min_shift_pxfloat2.00.5–20.0Minimum per-frame pixel shift to consider a strip ‘scrolling’. Lower detects slower text; higher filters noise
consistency_ratiofloat0.60.3–1.0Fraction of frame pairs that must show consistent shift for a band to count as scrolling (0.6 = 60%)
padinteger80–50Pixel padding above/below the detected band when cropping for stitching

Configuration Examples

{
  "feature_extractor": {
    "feature_extractor_name": "scrolling_text_extractor",
    "version": "v1",
    "input_mappings": {
      "video": "video_url"
    },
    "parameters": {}
  }
}

Performance & Costs

MetricValue
Cost30 credits per minute of video (frame extraction + stitching + VLM OCR)
External APIGoogle Gemini (VLM OCR)
TradeoffHigher fps improves fast-scroll accuracy at the cost of processing time

Vector Index

This extractor produces payload-only output — no vector index. The recovered text lives in the scrolling_text field. To make it semantically searchable, run text_extractor against scrolling_text.

Limitations

  • Video only: Accepts video inputs exclusively.
  • No embedding: Output is payload-only; semantic search requires chaining a text extractor.
  • Band-height sensitivity: strip_height should approximate the actual band height for reliable detection.
  • VLM dependency: OCR quality depends on Gemini VLM availability and panorama clarity.