modernbert-embed-base

by nomic-ai

ModernBERT-powered text embeddings -- 8192 tokens, Matryoshka dimensions, fast inference

2.8Mdl/month

149Mparams

HuggingFace Run on your data

Identifiers

Model ID

nomic-ai/modernbert-embed-base

Feature URI

mixpeek://text_extractor@v1/nomic_modernbert_embed_base_v1

Overview

ModernBERT Embed Base is Nomic AI's text embedding model built on the ModernBERT architecture, which modernizes the BERT encoder with rotary position embeddings, Flash Attention, and unpadded variable-length batching. The result is an embedding model that handles 8192-token inputs with faster inference than comparably sized alternatives.

At 149M parameters, it outperforms Nomic's previous embedding models on MTEB while being significantly cheaper to run. It supports Matryoshka representations at 768 and 256 dimensions. On Mixpeek, it provides a strong baseline text embedding model that balances quality, speed, and context length for document-heavy retrieval pipelines.

Architecture

ModernBERT encoder, 149M parameters. Rotary position embeddings for 8192-token context. Flash Attention 2 for efficient long-sequence processing. Matryoshka dimensions: 768 (full) and 256 (compressed).

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "text_embedding",
    version: "v1",
    parameters: { model_id: "nomic-ai/modernbert-embed-base" },
  },
});