BGE-VL-base
by BAAI
Lightweight vision-language embeddings for image and document retrieval
BAAI/BGE-VL-basemixpeek://image_extractor@v1/baai_bge_vl_base_v1Overview
BGE-VL Base is BAAI's compact vision-language embedding model for image-text retrieval and visual document search. It gives teams a smaller open model option when CLIP-style embeddings are too generic and larger multimodal retrievers are unnecessary.
On Mixpeek, BGE-VL Base can index screenshots, product images, scanned pages, and video keyframes so an agent can retrieve visual evidence with natural-language queries before asking a VLM to reason over the result.
Architecture
Sentence Transformers compatible vision-language embedding model with a compact parameter footprint. It maps visual and text inputs into a shared retrieval space for semantic similarity search.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "visual-docs",source: { url: "s3://docs/screenshots/" },feature_extractors: [{feature: "visual_embeddings",model: "BAAI/BGE-VL-base"}]});
Capabilities
- Image-text retrieval with compact inference cost
- Visual document and screenshot search
- Sentence Transformers integration
- MIT license
Use Cases on Mixpeek
Common Pipeline Companions
Explore on Mixpeek
Compare alternatives in this category
Hand-picked tools & platforms compared
Deep-dive technical guide
See how Mixpeek runs models as extractors
Store & search embeddings at scale
Usage-based pricing for pipelines
Compare models, APIs & infrastructure
Specification
Research Paper
BGE-VL Base
arxiv.orgBuild a pipeline with BGE-VL-base
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio