BrightData is a web data platform with pre-built datasets (LinkedIn, Amazon, Google Maps, etc.) and custom web scrapers. Each sync triggers a new dataset snapshot, waits for it to be ready, and ingests every row as a bucket object.
Overview
The BrightData integration connects Mixpeek to BrightData’s Datasets API. When a sync runs, Mixpeek:- Triggers a new dataset snapshot for the configured dataset ID.
- Polls the snapshot until status is
ready. - Downloads the JSONL results.
- Creates one bucket object per row, with the full JSON record as the object blob.
Prerequisites
- An active BrightData account.
- A BrightData API token (found in Dashboard → Account → API Token).
- Access to the dataset(s) you want to sync (subscription required for most datasets).
Configuration
Connection-Level Fields
| Field | Required | Description |
|---|---|---|
api_token | Yes | BrightData API token — encrypted at rest |
customer_id | No | BrightData customer ID for zone-level auth |
default_output_format | No | jsonl (default) or json |
country | No | ISO 3166-1 alpha-2 code for geo-targeting (e.g., us) |
Sync-Level Fields
| Field | Required | Description |
|---|---|---|
source_path | Yes | BrightData dataset ID (e.g., gd_l1vikfnt1wgvvqz95w) |
sync_mode | No | continuous, one_time, or scheduled |
polling_interval_seconds | No | Seconds between scheduled runs |
Setup
Get your BrightData API token
- Log in to your BrightData Dashboard.
- Go to Account Settings → API Tokens.
- Create a new token or copy an existing one.
Find your dataset ID
- Open the BrightData Marketplace.
- Select the dataset you want to sync (e.g., LinkedIn Company Profiles, Amazon Products).
- Copy the dataset ID from the URL or dataset detail page.
- LinkedIn Company Profiles:
gd_l1vikfnt1wgvvqz95w - Amazon Product Data:
gd_l7q7dkf244hwjntr0 - Google Maps Business Data:
gd_l7q7dkf244hwjntr1
Advanced Configuration
Geo-Targeting
Restrict data collection to a specific country:Schema Mapping
Map BrightData record fields to Mixpeek document fields using the sync’sschema_mapping:
File Filters
Filter which records are ingested using standard Mixpeek file filter fields:Data Model
Each BrightData record becomes a Mixpeek bucket object with:- Blob: Full JSON record body (stored as
application/json) - Metadata: Top-level string, integer, float, and boolean fields from the record
source_provider:brightdatasource_object_id:<snapshot_id>:<record_id>(deduplicated across syncs)
Sync Modes
| Mode | Description | When to Use |
|---|---|---|
continuous | Polls every polling_interval_seconds | Real-time monitoring, frequently updated datasets |
one_time | Single import, then completes | One-off data migrations, historical backfills |
scheduled | Runs on a fixed interval | Daily/weekly dataset refreshes |
Troubleshooting
Snapshot timeout error
Snapshot timeout error
BrightData snapshots expire after 1 hour by default. If your dataset is large:
- Reduce the number of records by adding geo-targeting (
countryfield) - Use a higher-tier BrightData subscription with faster processing
- Contact BrightData support to increase your snapshot limits
401 Unauthorized on connection test
401 Unauthorized on connection test
No records ingested after sync completes
No records ingested after sync completes
- Verify the dataset ID in
source_pathis correct - Check that your BrightData subscription includes this dataset
- Inspect the sync job logs in Mixpeek Studio for detailed error messages
Sync shows 'rate_limit_hits' in metrics
Sync shows 'rate_limit_hits' in metrics
BrightData API has rate limits per subscription tier. Increase
polling_interval_seconds
or upgrade your BrightData plan for higher throughput.
