moondream3-preview
by moondream
Compact visual reasoning model for fast image QA and scene captions
moondream/moondream3-previewmixpeek://image_extractor@v1/moondream3_preview_v1Overview
Moondream3 Preview is a compact image-text model from Moondream focused on visual question answering, captioning, and deployable visual reasoning. It continues the Moondream line's emphasis on small-model ergonomics while keeping enough visual reasoning quality for production perception pipelines.
On Mixpeek, Moondream3 is a useful second-stage model after cheap embedding retrieval. Use it to caption candidate images, answer bounded visual questions, or extract concise observations that an agent can cite.
Architecture
Image-text-to-text model exposed through Hugging Face Transformers custom code. It supports caption generation, visual question answering, and streaming output for interactive applications.
Mixpeek SDK Integration
import { Mixpeek } from "mixpeek";const mx = new Mixpeek({ apiKey: "API_KEY" });await mx.collections.ingest({collection_id: "image-library",source: { url: "s3://assets/images/" },feature_extractors: [{feature: "scene_caption",model: "moondream/moondream3-preview",params: { caption_length: "short" }}]});
Capabilities
- Image captioning with short and detailed modes
- Visual question answering over retrieved images
- Compact deployment compared with large VLMs
- Streaming generation support
Use Cases on Mixpeek
Performance
Best used after first-stage retrieval or for high-throughput caption generation
Specification
Research Paper
Moondream3 Preview model card
arxiv.orgBuild a pipeline with moondream3-preview
Add this model to a processing pipeline alongside other extractors. Combine with retrieval stages for end-to-end search.
Open Studio