What types of entities are extracted for the knowledge graph?

Default entity types include Person, Organization, Location, Date, Product, Event, Concept, and Regulation. You can customize entity types via the `entity_types` parameter to add domain-specific categories like 'Drug', 'Gene', 'Legal Clause', or 'Financial Instrument'.

What graph output formats are supported?

Output formats include JSON-LD (default), RDF/Turtle, Neo4j-compatible Cypher statements, and a flat JSON nodes/edges format. Set the `output_format` parameter to 'json-ld', 'rdf', 'cypher', or 'nodes_edges' depending on your target graph database.

Can I merge knowledge graphs from multiple documents?

Yes. Pass multiple documents in the `sources` array and set `merge_graphs` to true. Entity resolution automatically deduplicates nodes that refer to the same real-world entity across documents, creating a unified graph.

How accurate is the relationship extraction between entities?

Relationship extraction accuracy is typically 80-90% for common relationship types on well-written English text. Each edge includes a confidence score. You can filter low-confidence edges with the `min_confidence` parameter. Domain-specific fine-tuning improves accuracy for specialized vocabularies.

data

Document
Knowledge Graph
Converter

Transform documents into structured knowledge graphs by extracting entities, relationships, and concepts. Produces nodes and edges suitable for graph databases, enabling complex queries, reasoning, and visualization over document content.

Max file size: 200 MB

Estimated: 5-30 sec per page

5 input formats

How It Works

Upload a document or provide a URL to the Mixpeek API.

Text is extracted and segmented into paragraphs and sections.

Named entity recognition identifies people, organizations, locations, concepts, and domain-specific terms.

Relationship extraction identifies connections between entities (e.g., 'works at', 'located in', 'causes').

A knowledge graph is returned as nodes and edges in JSON-LD, RDF, or a custom graph format.

Code Examples

from mixpeek import Mixpeek

client = Mixpeek(api_key="YOUR_API_KEY")

result = client.convert(
    source="https://example.com/research-paper.pdf",
    from_format="document",
    to_format="knowledge-graph",
    options={
        "entity_types": ["Person", "Organization", "Concept", "Method"],
        "output_format": "nodes_edges",
        "min_confidence": 0.6,
        "include_context": True
    }
)

print(f"Nodes: {len(result.nodes)}, Edges: {len(result.edges)}")
for node in result.nodes[:5]:
    print(f"  [{node.type}] {node.label}")
for edge in result.edges[:5]:
    print(f"  {edge.source} --{edge.relation}--> {edge.target}")

Use Cases

Build knowledge bases from legal contracts and regulatory documents

Map relationships between entities in research paper collections

Create interactive knowledge graphs for corporate intelligence platforms

Power question-answering systems that reason over document relationships

Supported Input Formats

PDF

DOCX

TXT

HTML

Markdown

Quick Info

Categorydata

Max File Size200 MB

Est. Time5-30 sec per page

Extractortext-extractor

Try This Conversion

Get started with the Mixpeek API and convert your first file in minutes.

Frequently Asked Questions

Related Converters

PDF

Text

PDF to Text

Extract clean, structured text from PDF documents including scanned pages, multi-column layouts, headers/footers, and tables. Combines traditional parsing with OCR and layout analysis for maximum accuracy.

PDF

JSON

PDF to Structured Data

Extract structured key-value pairs, tables, and form fields from PDF documents. Uses layout analysis and LLM extraction to produce clean JSON output, even from complex forms and invoices.

HTML

JSON

HTML to Structured Data

Extract structured data from web pages using a combination of CSS/XPath selectors and LLM-based extraction. Captures product details, article metadata, contact information, and custom schemas from any website.

Text

Embeddings

Text to Embeddings

Convert text strings, paragraphs, or documents into dense vector embeddings using state-of-the-art language models. Supports batching, chunking, and multiple model options for optimal retrieval performance.

PDF

JSON

PDF to JSON

Convert PDF documents into clean, structured JSON output. Extracts text, tables, form fields, metadata, and document structure into a machine-readable JSON format suitable for API ingestion, database storage, and programmatic processing.

Ready to convert document to knowledge graph?

Start using the Mixpeek Document to Knowledge Graph in minutes. Sign up for a free API key and follow the documentation to get started.

DocumentKnowledge GraphConverter

How It Works

Code Examples

Use Cases

Supported Input Formats

Quick Info

Try This Conversion

Frequently Asked Questions

Related Converters

PDF to Text

PDF to Structured Data

HTML to Structured Data

Text to Embeddings

PDF to JSON

Ready to convert document to knowledge graph?

Document
Knowledge Graph
Converter