ReMatch-3B
by FireRedTeam
Multimodal retriever trained with generative matching for stronger query-item alignment
FireRedTeam/ReMatch-3Bmixpeek://image_extractor@v1/fireredteam_rematch_3b_v1Overview
ReMatch turns a multimodal LLM into a retrieval model by adding a chat-style generative matching objective. Instead of relying only on contrastive pairs, it teaches the model to reason about whether a query and candidate match, then distills that signal into retrieval embeddings.
On Mixpeek, ReMatch is relevant for agent retrieval when queries are specific, compositional, or visual-textual, such as finding a frame where a person is doing one action while an object appears in a certain place.
Architecture
3B multimodal retriever with learnable representation tokens and a generative matching training objective. The model supports English and Chinese according to the model card.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "evidence-library",source: { url: "https://example.com/camera-stills/" },feature_extractors: [{feature: "multimodal_embedding",model: "FireRedTeam/ReMatch-3B"}]});
Capabilities
- Multimodal retrieval from image and text inputs
- Generative matching objective for hard query-candidate pairs
- Single-vector retrieval path with richer alignment than plain contrastive training
- Apache 2.0 license
Use Cases on Mixpeek
Benchmarks
| Dataset | Metric | Score | Source |
|---|---|---|---|
| CVPR 2026 model card | Status | Accepted | Hugging Face model card |
Specification
Research Paper
ReMatch: Boosting Representation through Matching for Multimodal Retrieval
arxiv.orgBuild a pipeline with ReMatch-3B
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio