Are images in the PDF preserved?

Yes. Embedded images are extracted and referenced as Markdown image links. The images are saved alongside the Markdown or uploaded to your bucket, depending on your configuration.

How are complex tables handled?

Simple tables map directly to Markdown table syntax. Complex tables with merged cells or nested headers are simplified to the closest valid Markdown table representation, with notes on any lost structure.

Does it preserve hyperlinks?

Yes. Internal and external hyperlinks in the PDF are converted to Markdown link syntax. Cross-references within the document become anchor links.

document

PDF
Markdown
Converter

Convert PDF documents to clean Markdown format, preserving headings, lists, tables, links, and emphasis. Ideal for migrating content into wikis, CMS platforms, and documentation systems.

Max file size: 200 MB

Estimated: 2-10 sec per page

1 input formats

How It Works

Upload a PDF or provide a URL.

Layout analysis identifies headings, paragraphs, lists, and tables.

Structural elements are mapped to Markdown syntax.

Tables are converted to Markdown table format.

Clean Markdown output is returned.

Code Examples

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

result = client.convert(
    source="https://example.com/documentation.pdf",
    from_format="pdf",
    to_format="markdown",
    options={
        "extract_images": True,
        "heading_detection": True
    }
)

print(result.markdown)

Use Cases

Migrate documentation from PDF to GitBook or Notion

Convert research papers to blog-ready Markdown

Import PDF content into static site generators (Hugo, Jekyll)

Create editable drafts from finalized PDF reports

Supported Input Formats

PDF

Quick Info

Categorydocument

Max File Size200 MB

Est. Time2-10 sec per page

Extractordocument-descriptor

Try This Conversion

Get started with the Mixpeek API and convert your first file in minutes.

Frequently Asked Questions

Related Converters

PDF

Text

PDF to Text

Extract clean, structured text from PDF documents including scanned pages, multi-column layouts, headers/footers, and tables. Combines traditional parsing with OCR and layout analysis for maximum accuracy.

PDF

JSON

PDF to Structured Data

Extract structured key-value pairs, tables, and form fields from PDF documents. Uses layout analysis and LLM extraction to produce clean JSON output, even from complex forms and invoices.

HTML