zembed-1-embedding
by zeroentropy
Domain-specialist 4B embedding model distilled from reranker for finance, legal, healthcare, and code
zeroentropy/zembed-1-embeddingmixpeek://text_extractor@v1/zeroentropy_zembed_1_v1Overview
zEmbed-1 is a 4B-parameter text embedding model built on Qwen3-4B and distilled from ZeroEntropy's zerank-2 reranker using an ELO-inspired training methodology. It leads on domain-specific benchmarks — outperforming Cohere Embed v4, OpenAI text-embedding-3-large, and Gemini Embedding on finance (0.4476), healthcare (0.6260), legal (0.6723), code (0.6452), and STEM (0.5283) retrieval tasks.
On Mixpeek, zEmbed-1 is the top choice for regulated-industry search where domain accuracy matters more than model size. Its flexible output dimensions (2560 down to 40) and support for binary quantization enable deployment from cloud to edge.
Architecture
Qwen3-4B backbone with task-specific encode_query() and encode_document() prompting. 4B parameters. 32K token context. Flexible projection head supporting 7 output dimensions (2560, 1280, 640, 320, 160, 80, 40). Trained via zELO methodology using adjusted Elo ratings for relevance scoring, distilled from zerank-2 cross-encoder.
Mixpeek SDK Integration
from mixpeek import Mixpeekmixpeek = Mixpeek(api_key="YOUR_API_KEY")mixpeek.ingest.documents(collection="sec_filings",source={"type": "s3", "bucket": "finance-docs"},pipeline={"embedding": {"model": "mixpeek://text_extractor@v1/zeroentropy_zembed_1_v1"}})
Capabilities
- Best-in-class domain retrieval for finance, healthcare, legal, code, and STEM
- 32K token context length
- 7 flexible embedding dimensions (2560 down to 40)
- 50+ language support with >50% non-English training data
- Binary quantization support for edge deployment
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| Finance Retrieval | nDCG@10 | 0.4476 | Model card |
| Healthcare Retrieval | nDCG@10 | 0.6260 | Model card |
| Legal Retrieval | nDCG@10 | 0.6723 | Model card |
| Code Retrieval | nDCG@10 | 0.6452 | Model card |
Performance
Common Pipeline Companions
Specification
Research Paper
Model paper or technical report
arxiv.orgBuild a pipeline with zembed-1-embedding
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio