Overview
The Email connector lets you ingest documents by forwarding emails to a dedicated address. Each email becomes a bucket object with the body as a text blob, each attachment as a typed blob, and the original .eml preserved for chain of custody.
This is built for compliance-oriented workflows — legal document intake, healthcare record forwarding, secure support inboxes — where email is the transport and the documents (attachments) are the payload.
Prerequisites
- A Mixpeek account with an active namespace
- A bucket and sync configured for the email connection
- Cloudflare account with
mixpeek.com (or your custom domain) in Cloudflare DNS — Email Routing is free on all plans
How It Works
- Create an email connection — Mixpeek assigns a unique inbound address (e.g.,
conn_abc123@inbound.mixpeek.com)
- Cloudflare receives the email — MX records point to Cloudflare Email Routing, which routes to a Worker
- Worker POSTs raw .eml — The Cloudflare Email Worker reads the raw RFC 2822 bytes and POSTs them to the Mixpeek webhook
- Mixpeek parses and stores — MIME parsing extracts headers → metadata, body → text blob, attachments → S3-backed blobs, raw .eml → S3
Customer Email Client
│
▼
Cloudflare MX (inbound.mixpeek.com)
│
▼
Cloudflare Email Routing (catch-all)
│
▼
mixpeek-email-ingest Worker
│ POST raw .eml bytes
▼
api.mixpeek.com/v1/webhooks/email/{connection_id}
│ MIME parse → S3 upload → object + blobs
▼
Bucket Objects (S3 + MongoDB)
Configuration
Connection-level fields
| Field | Required | Default | Description |
|---|
allowed_senders | No | [] (all) | Sender allowlist. Exact addresses or domain wildcards (*@company.com). Empty = accept all. |
store_raw_eml | No | true | Store the original .eml file as an additional blob for chain of custody. |
Auto-provisioned fields (read-only)
| Field | Description |
|---|
inbound_address | System-assigned email address for this connection (e.g., conn_abc123@inbound.mixpeek.com) |
webhook_secret | Auto-generated HMAC-SHA256 signing secret for webhook verification |
Setup
Create an email connection
from mixpeek import Mixpeek
mp = Mixpeek("your_api_key")
connection = mp.connections.create(
name="Legal Intake Inbox",
provider_type="email",
provider_config={
"credentials": {"type": "webhook_secret"},
"allowed_senders": ["*@lawfirm.com", "paralegal@partner.com"],
"store_raw_eml": True,
},
)
print(f"Inbound address: {connection.provider_config['inbound_address']}")
print(f"Webhook URL: https://api.mixpeek.com/v1/webhooks/email/{connection.connection_id}")
Create a bucket sync
Link the email connection to a bucket so incoming emails become objects:sync = mp.buckets.syncs.create(
bucket_id="your_bucket_id",
connection_id=connection.connection_id,
source_path="inbox",
)
Deploy the Cloudflare Email Worker
The Email Worker receives emails at *@inbound.mixpeek.com and POSTs the raw .eml bytes to the Mixpeek webhook. The worker source is in server/infra/cloudflare/email-worker/.cd server/infra/cloudflare/email-worker
npm install
wrangler login
wrangler deploy
Optionally set a global signing key:wrangler secret put WEBHOOK_SIGNING_KEY
Enable Cloudflare Email Routing
In the Cloudflare Dashboard:
- Go to your domain (
mixpeek.com) → Email Routing
- Enable Email Routing — Cloudflare auto-adds MX records for
inbound.mixpeek.com
- Go to Routing rules → Catch-all address
- Set action to Send to a Worker → select
mixpeek-email-ingest
Cloudflare Email Routing is free on all plans. MX records are managed automatically — no manual DNS configuration needed.
Send a test email
Send an email with an attachment to the inbound address and verify the object appears in your bucket.
Object Structure
Each email becomes one bucket object with multiple blobs:
| Blob Property | Type | Content |
|---|
email_body | text | Email body (plain text preferred, HTML fallback) |
attachment_0, attachment_1, … | varies | Each attachment, typed by MIME (image, pdf, video, etc.) |
raw_eml | text | Original .eml file stored in S3 (if store_raw_eml is enabled) |
These are set as root-level fields on the object and can be mapped to your bucket schema:
| Field | Type | Description |
|---|
email_from | string | Sender address |
email_to | list[string] | Recipient addresses |
email_cc | list[string] | CC addresses |
email_subject | string | Subject line |
email_date | string (ISO 8601) | Date the email was sent |
email_message_id | string | RFC 2822 Message-ID (used for deduplication) |
email_in_reply_to | string | Parent message ID (for threading) |
email_references | list[string] | Thread reference IDs |
email_attachment_count | integer | Number of attachments |
Schema Mapping
Map email fields to your collection schema to make them searchable:
{
"schema": {
"fields": [
{"name": "email_from", "type": "text"},
{"name": "email_subject", "type": "text"},
{"name": "email_date", "type": "text"},
{"name": "email_attachment_count", "type": "text"}
]
}
}
Use attribute_filter in your retriever to query by sender, date, or subject:
{
"stage_type": "filter",
"stage_id": "attribute_filter",
"parameters": {
"conditions": {
"AND": [
{"field": "email_from", "operator": "contains", "value": "@lawfirm.com"},
{"field": "email_date", "operator": "gte", "value": "2026-01-01"}
]
}
}
}
Security
| Feature | Description |
|---|
| Sender allowlist | Only accept emails from specified addresses or domains |
| Webhook signature | HMAC-SHA256 verification of inbound payloads |
| Deduplication | Duplicate emails (same Message-ID) are skipped |
| Chain of custody | Raw .eml uploaded to S3 with SHA-256 hash for forensic integrity |
| Credential encryption | Webhook secret encrypted at rest (Fernet / CSFLE) |
| Audit logging | All connection events logged to ClickHouse (365-day retention) |
Email headers and bodies may contain PII (names, email addresses, phone numbers). Consider enabling PII redaction in your collection pipeline or restricting access to the namespace containing email data.
Compliance Notes
| Requirement | How Mixpeek addresses it |
|---|
| HIPAA — encryption in transit | Cloudflare enforces TLS on MX; webhook endpoint requires HTTPS (TLS 1.2+) |
| HIPAA — encryption at rest | Credentials encrypted via CSFLE; all blobs (body, attachments, raw .eml) stored in encrypted S3 |
| HIPAA — audit trail | All access logged to ClickHouse audit service |
| eDiscovery — immutability | Raw .eml in S3 with SHA-256 hash, stored alongside parsed content |
| eDiscovery — chain of custody | Source tracking: source_provider=email, source_object_id=email://{message_id} |
| SOC 2 — access control | Per-namespace RBAC (ADMIN, MEMBER, VIEWER) with granular operations |
Mixpeek does not currently hold a HIPAA BAA. If you need a BAA for PHI handling, contact us at sales@mixpeek.com to discuss your requirements.
Troubleshooting
| Issue | Solution |
|---|
| 403 — sender not in allowlist | Add the sender’s address or domain to allowed_senders |
| 404 — connection not found | Verify the connection_id in the webhook URL matches an active email connection |
| 400 — no active bucket sync | Create a bucket sync linked to this email connection |
| Duplicate emails skipped | Expected behavior — emails with the same Message-ID are deduplicated |
| Attachments not appearing | Check that the email service is sending the full raw RFC 2822 message, not a stripped-down version |