PDF

Header & Footer Extraction

Extract recurring header and footer content from PDFs

410K runs

Note: This playground provides simulated output to showcase functionality. No input data is processed or stored on our servers. Use this demo to explore the feature extractor's capabilities before integrating it into your application.

Input

Input Type

File URL string

Enter a URL to a pdf file

Upload pdf

Drag and drop a pdf file here, or click to browse

Select File

# extract_watermarks boolean

Whether to extract watermarks. Default: true

# min_repetition number

Minimum number of pages where content must appear to be considered header/footer. Default: 2

Output

{
  "headers": [
    {
      "text": "Company XYZ - Confidential",
      "repeats_on_pages": [
        1,
        2,
        3,
        4,
        5
      ],
      "bbox": [
        250,
        30,
        450,
        50
      ],
      "contains_logo": true
    }
  ],
  "footers": [
    {
      "text": "Page {page_number} of {total_pages}",
      "repeats_on_pages": [
        1,
        2,
        3,
        4,
        5
      ],
      "bbox": [
        280,
        780,
        380,
        800
      ],
      "contains_date": true
    }
  ],
  "watermarks": [
    {
      "text": "DRAFT",
      "angle": 45,
      "opacity": 0.3
    }
  ]
}

Ready to run Header & Footer Extraction on your data? Spin it up in Studio — no infra to host.

Run this in Studio

Already have embeddings? Skip extraction — search your own vectors with MVS. First 1M vectors free.

Try MVS →