ReMatch-3B

by FireRedTeam

Multimodal retriever trained with generative matching for stronger query-item alignment

8dl/month

5likes

3Bparams

HuggingFace Run on your data

Identifiers

Model ID

FireRedTeam/ReMatch-3B

Feature URI

mixpeek://image_extractor@v1/fireredteam_rematch_3b_v1

Overview

ReMatch turns a multimodal LLM into a retrieval model by adding a chat-style generative matching objective. Instead of relying only on contrastive pairs, it teaches the model to reason about whether a query and candidate match, then distills that signal into retrieval embeddings.

On Mixpeek, ReMatch is relevant for agent retrieval when queries are specific, compositional, or visual-textual, such as finding a frame where a person is doing one action while an object appears in a certain place.

Architecture

3B multimodal retriever with learnable representation tokens and a generative matching training objective. The model supports English and Chinese according to the model card.

Mixpeek SDK Integration

import { Mixpeek } from "mixpeek";

const mx = new Mixpeek({ apiKey: "API_KEY" });

// Managed: create a collection over a bucket; Mixpeek runs this model's extractor
const collection = await mx.collections.create({
  namespace_id: "my-namespace",
  collection_name: "my-collection",
  source: { type: "bucket", bucket_ids: ["bkt_your_bucket"] },
  feature_extractor: {
    feature_extractor_name: "multimodal_embedding",
    version: "v1",
    parameters: { model_id: "FireRedTeam/ReMatch-3B" },
  },
});

Capabilities

Multimodal retrieval from image and text inputs
Generative matching objective for hard query-candidate pairs
Single-vector retrieval path with richer alignment than plain contrastive training
Apache 2.0 license