modernbert-embed-base
by nomic-ai
ModernBERT-powered text embeddings -- 8192 tokens, Matryoshka dimensions, fast inference
nomic-ai/modernbert-embed-basemixpeek://text_extractor@v1/nomic_modernbert_embed_base_v1Overview
ModernBERT Embed Base is Nomic AI's text embedding model built on the ModernBERT architecture, which modernizes the BERT encoder with rotary position embeddings, Flash Attention, and unpadded variable-length batching. The result is an embedding model that handles 8192-token inputs with faster inference than comparably sized alternatives.
At 149M parameters, it outperforms Nomic's previous embedding models on MTEB while being significantly cheaper to run. It supports Matryoshka representations at 768 and 256 dimensions. On Mixpeek, it provides a strong baseline text embedding model that balances quality, speed, and context length for document-heavy retrieval pipelines.
Architecture
ModernBERT encoder, 149M parameters. Rotary position embeddings for 8192-token context. Flash Attention 2 for efficient long-sequence processing. Matryoshka dimensions: 768 (full) and 256 (compressed).
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "my-collection",source: { url: "https://example.com/documentation.pdf" },feature_extractors: [{name: "text_embedding",version: "v1",params: {model_id: "nomic-ai/modernbert-embed-base"}}]});
Capabilities
- 8192 token context window
- Matryoshka dimension reduction (768/256)
- Flash Attention 2 for fast inference
- Variable-length batching (no padding waste)
- Strong MTEB performance at 149M params
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| MTEB Retrieval (en) | nDCG@10 | 54.7 | Nomic AI, 2025 -- Model Card |
Performance
Specification
Research Paper
ModernBERT
arxiv.orgBuild a pipeline with modernbert-embed-base
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio