Mixpeek Logo
    Text

    Web Scraper

    Extract structured data from webpages while maintaining semantic context and relationships

    Note: This playground provides simulated output to showcase functionality. No input data is processed or stored on our servers. Use this demo to explore the feature extractor's capabilities before integrating it into your application.

    Input

    Enter the text you want to process

    How many levels of nested content to extract. Default: 3

    Minimum confidence score to include a field (0.0-1.0). Default: 0.7

    How strictly to enforce the target schema. Default: flexible

    Whether to preserve HTML structure in output. Default: false

    Whether to extract page metadata (title, description, etc.). Default: true

    Output

    {
    "url": "https://example.com/article",
    "extracted_at": "2024-01-20T10:30:00Z",
    "metadata": {
    "title": "Sample Article Title",
    "description": "Article description from meta tags",
    "author": "John Doe",
    "published_date": "2024-01-15",
    "language": "en"
    },
    "content": {
    "main_heading": "Article Main Heading",
    "body_text": "Full article text content...",
    "sections": [
    {
    "heading": "Section 1",
    "content": "Section 1 content...",
    "confidence": 0.95
    }
    ]
    },
    "structured_data": {
    "products": [],
    "prices": [],
    "ratings": [],
    "reviews": []
    },
    "semantic_relationships": [
    {
    "type": "parent-child",
    "from": "main_heading",
    "to": "sections",
    "confidence": 0.98
    }
    ],
    "confidence_scores": {
    "overall": 0.92,
    "field_scores": {
    "title": 0.98,
    "body_text": 0.95,
    "metadata": 0.88
    }
    }
    }