PDF

PDF Text Extraction

Extract structured text and layout information from PDFs

645K runs

Note: This playground provides simulated output to showcase functionality. No input data is processed or stored on our servers. Use this demo to explore the feature extractor's capabilities before integrating it into your application.

Input

Input Type

File URL string

Enter a URL to a pdf file

Upload pdf

Drag and drop a pdf file here, or click to browse

Select File

# language string

Language of the PDF text. Default: en

# extract_images boolean

Whether to extract and process images in the PDF. Default: false

Output

{
  "pages": [
    {
      "number": 1,
      "text": "Annual Financial Report 2023\n\nCompany XYZ\n\nExecutive Summary",
      "paragraphs": [
        {
          "text": "Annual Financial Report 2023",
          "bbox": [
            100,
            50,
            500,
            80
          ]
        },
        {
          "text": "Company XYZ",
          "bbox": [
            200,
            120,
            400,
            140
          ]
        },
        {
          "text": "Executive Summary",
          "bbox": [
            150,
            200,
            450,
            220
          ]
        }
      ]
    }
  ],
  "metadata": {
    "title": "Annual Financial Report 2023",
    "author": "Finance Department",
    "creation_date": "2023-12-15"
  },
  "total_pages": 24
}