Skip to main content

Overview

The Email connector lets you ingest documents by forwarding emails to a dedicated address. Each email becomes a bucket object with the body as a text blob, each attachment as a typed blob, and the original .eml preserved for chain of custody. This is built for compliance-oriented workflows — legal document intake, healthcare record forwarding, secure support inboxes — where email is the transport and the documents (attachments) are the payload.

Prerequisites

  • A Mixpeek account with an active namespace
  • A bucket and sync configured for the email connection
  • Cloudflare account with mixpeek.com (or your custom domain) in Cloudflare DNS — Email Routing is free on all plans

How It Works

  1. Create an email connection — Mixpeek assigns a unique inbound address (e.g., conn_abc123@inbound.mixpeek.com)
  2. Cloudflare receives the email — MX records point to Cloudflare Email Routing, which routes to a Worker
  3. Worker POSTs raw .eml — The Cloudflare Email Worker reads the raw RFC 2822 bytes and POSTs them to the Mixpeek webhook
  4. Mixpeek parses and stores — MIME parsing extracts headers → metadata, body → text blob, attachments → S3-backed blobs, raw .eml → S3
Customer Email Client


  Cloudflare MX (inbound.mixpeek.com)


  Cloudflare Email Routing (catch-all)


  mixpeek-email-ingest Worker
        │  POST raw .eml bytes

  api.mixpeek.com/v1/webhooks/email/{connection_id}
        │  MIME parse → S3 upload → object + blobs

  Bucket Objects (S3 + MongoDB)

Configuration

Connection-level fields

FieldRequiredDefaultDescription
allowed_sendersNo[] (all)Sender allowlist. Exact addresses or domain wildcards (*@company.com). Empty = accept all.
store_raw_emlNotrueStore the original .eml file as an additional blob for chain of custody.

Auto-provisioned fields (read-only)

FieldDescription
inbound_addressSystem-assigned email address for this connection (e.g., conn_abc123@inbound.mixpeek.com)
webhook_secretAuto-generated HMAC-SHA256 signing secret for webhook verification

Setup

1

Create an email connection

from mixpeek import Mixpeek

mp = Mixpeek("your_api_key")

connection = mp.connections.create(
    name="Legal Intake Inbox",
    provider_type="email",
    provider_config={
        "credentials": {"type": "webhook_secret"},
        "allowed_senders": ["*@lawfirm.com", "paralegal@partner.com"],
        "store_raw_eml": True,
    },
)

print(f"Inbound address: {connection.provider_config['inbound_address']}")
print(f"Webhook URL: https://api.mixpeek.com/v1/webhooks/email/{connection.connection_id}")
2

Create a bucket sync

Link the email connection to a bucket so incoming emails become objects:
sync = mp.buckets.syncs.create(
    bucket_id="your_bucket_id",
    connection_id=connection.connection_id,
    source_path="inbox",
)
3

Deploy the Cloudflare Email Worker

The Email Worker receives emails at *@inbound.mixpeek.com and POSTs the raw .eml bytes to the Mixpeek webhook. The worker source is in server/infra/cloudflare/email-worker/.
cd server/infra/cloudflare/email-worker
npm install
wrangler login
wrangler deploy
Optionally set a global signing key:
wrangler secret put WEBHOOK_SIGNING_KEY
4

Enable Cloudflare Email Routing

In the Cloudflare Dashboard:
  1. Go to your domain (mixpeek.com) → Email Routing
  2. Enable Email Routing — Cloudflare auto-adds MX records for inbound.mixpeek.com
  3. Go to Routing rulesCatch-all address
  4. Set action to Send to a Worker → select mixpeek-email-ingest
Cloudflare Email Routing is free on all plans. MX records are managed automatically — no manual DNS configuration needed.
5

Send a test email

Send an email with an attachment to the inbound address and verify the object appears in your bucket.

Object Structure

Each email becomes one bucket object with multiple blobs:
Blob PropertyTypeContent
email_bodytextEmail body (plain text preferred, HTML fallback)
attachment_0, attachment_1, …variesEach attachment, typed by MIME (image, pdf, video, etc.)
raw_emltextOriginal .eml file stored in S3 (if store_raw_eml is enabled)

Email metadata fields

These are set as root-level fields on the object and can be mapped to your bucket schema:
FieldTypeDescription
email_fromstringSender address
email_tolist[string]Recipient addresses
email_cclist[string]CC addresses
email_subjectstringSubject line
email_datestring (ISO 8601)Date the email was sent
email_message_idstringRFC 2822 Message-ID (used for deduplication)
email_in_reply_tostringParent message ID (for threading)
email_referenceslist[string]Thread reference IDs
email_attachment_countintegerNumber of attachments

Schema Mapping

Map email fields to your collection schema to make them searchable:
{
  "schema": {
    "fields": [
      {"name": "email_from", "type": "text"},
      {"name": "email_subject", "type": "text"},
      {"name": "email_date", "type": "text"},
      {"name": "email_attachment_count", "type": "text"}
    ]
  }
}
Use attribute_filter in your retriever to query by sender, date, or subject:
{
  "stage_type": "filter",
  "stage_id": "attribute_filter",
  "parameters": {
    "conditions": {
      "AND": [
        {"field": "email_from", "operator": "contains", "value": "@lawfirm.com"},
        {"field": "email_date", "operator": "gte", "value": "2026-01-01"}
      ]
    }
  }
}

Security

FeatureDescription
Sender allowlistOnly accept emails from specified addresses or domains
Webhook signatureHMAC-SHA256 verification of inbound payloads
DeduplicationDuplicate emails (same Message-ID) are skipped
Chain of custodyRaw .eml uploaded to S3 with SHA-256 hash for forensic integrity
Credential encryptionWebhook secret encrypted at rest (Fernet / CSFLE)
Audit loggingAll connection events logged to ClickHouse (365-day retention)
Email headers and bodies may contain PII (names, email addresses, phone numbers). Consider enabling PII redaction in your collection pipeline or restricting access to the namespace containing email data.

Compliance Notes

RequirementHow Mixpeek addresses it
HIPAA — encryption in transitCloudflare enforces TLS on MX; webhook endpoint requires HTTPS (TLS 1.2+)
HIPAA — encryption at restCredentials encrypted via CSFLE; all blobs (body, attachments, raw .eml) stored in encrypted S3
HIPAA — audit trailAll access logged to ClickHouse audit service
eDiscovery — immutabilityRaw .eml in S3 with SHA-256 hash, stored alongside parsed content
eDiscovery — chain of custodySource tracking: source_provider=email, source_object_id=email://{message_id}
SOC 2 — access controlPer-namespace RBAC (ADMIN, MEMBER, VIEWER) with granular operations
Mixpeek does not currently hold a HIPAA BAA. If you need a BAA for PHI handling, contact us at sales@mixpeek.com to discuss your requirements.

Troubleshooting

IssueSolution
403 — sender not in allowlistAdd the sender’s address or domain to allowed_senders
404 — connection not foundVerify the connection_id in the webhook URL matches an active email connection
400 — no active bucket syncCreate a bucket sync linked to this email connection
Duplicate emails skippedExpected behavior — emails with the same Message-ID are deduplicated
Attachments not appearingCheck that the email service is sending the full raw RFC 2822 message, not a stripped-down version