Text

Web Scraper

Extract structured data from webpages while maintaining semantic context and relationships

156K runs

Note: This playground provides simulated output to showcase functionality. No input data is processed or stored on our servers. Use this demo to explore the feature extractor's capabilities before integrating it into your application.

Input

Text Input string

Enter the text you want to process

# extraction_depth integer

How many levels of nested content to extract. Default: 3

# confidence_threshold number

Minimum confidence score to include a field (0.0-1.0). Default: 0.7

# schema_strictness string

How strictly to enforce the target schema. Default: flexible

# preserve_html boolean

Whether to preserve HTML structure in output. Default: false

# extract_metadata boolean

Whether to extract page metadata (title, description, etc.). Default: true

Output

{
  "url": "https://example.com/article",
  "extracted_at": "2024-01-20T10:30:00Z",
  "metadata": {
    "title": "Sample Article Title",
    "description": "Article description from meta tags",
    "author": "John Doe",
    "published_date": "2024-01-15",
    "language": "en"
  },
  "content": {
    "main_heading": "Article Main Heading",
    "body_text": "Full article text content...",
    "sections": [
      {
        "heading": "Section 1",
        "content": "Section 1 content...",
        "confidence": 0.95
      }
    ]
  },
  "structured_data": {
    "products": [],
    "prices": [],
    "ratings": [],
    "reviews": []
  },
  "semantic_relationships": [
    {
      "type": "parent-child",
      "from": "main_heading",
      "to": "sections",
      "confidence": 0.98
    }
  ],
  "confidence_scores": {
    "overall": 0.92,
    "field_scores": {
      "title": 0.98,
      "body_text": 0.95,
      "metadata": 0.88
    }
  }
}