nomic-embed-text-v2-moe
by nomic-ai
First Mixture-of-Experts text embedding model with 100-language multilingual support
nomic-ai/nomic-embed-text-v2-moemixpeek://text_extractor@v1/nomic_embed_v2_moe_v1Overview
Nomic Embed Text v2 MoE is the first general-purpose Mixture-of-Experts text embedding model, using 8 experts with top-2 routing to deliver 475M total parameters but only 305M active at inference. Trained on 1.6B high-quality pairs with consistency filtering, it achieves state-of-the-art performance on both BEIR and MIRACL benchmarks while remaining competitive with models twice its size.
On Mixpeek, nomic-embed-text-v2-moe provides efficient multilingual text embeddings for search pipelines that span ~100 languages, with Matryoshka dimension support (768 down to 256) for flexible storage and retrieval tradeoffs.
Architecture
Mixture-of-Experts transformer encoder with 8 experts, top-2 routing. 475M total parameters, 305M active during inference. Trained with weakly-supervised contrastive pretraining followed by supervised fine-tuning. Matryoshka representation learning for flexible output dimensions.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";
const mx = new Mixpeek({ apiKey: "API_KEY" });
// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
namespace_id: "my-namespace",
collection_name: "my-collection",
source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
feature_extractor: {
feature_extractor_name: "text_embedding",
version: "v1",
parameters: { model_id: "nomic-ai/nomic-embed-text-v2-moe" },
},
});Capabilities
- MoE efficiency: 305M active / 475M total parameters
- ~100 language multilingual support
- 768-dimensional embeddings with Matryoshka truncation to 256
- State-of-the-art on BEIR and MIRACL benchmarks
- Apache 2.0 fully open-source
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| BEIR | nDCG@10 | 52.86 | Nomic AI, 2025 — nomic-embed-text-v2 paper |
| MIRACL | nDCG@10 | 65.80 | Nomic AI, 2025 — nomic-embed-text-v2 paper |
Performance
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Research Paper
Training Sparse Mixture Of Experts Text Embedding Models
arxiv.orgBuild a pipeline with nomic-embed-text-v2-moe
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio