144Kdl/month
314likes
Identifier
Model ID
HuggingFaceTB/SmolVLM2-2.2B-InstructTags
transformerssafetensorssmolvlmimage-text-to-textvideo-text-to-textconversationalendataset:HuggingFaceM4/the_cauldrondataset:HuggingFaceM4/Docmatixdataset:lmms-lab/LLaVA-OneVision-Datadataset:lmms-lab/M4-Instruct-Datadataset:HuggingFaceFV/finevideodataset:MAmmoTH-VL/MAmmoTH-VL-Instruct-12Mdataset:lmms-lab/LLaVA-Video-178Kdataset:orrzohar/Video-STaRdataset:Mutonix/Vriptdataset:TIGER-Lab/VISTA-400Kdataset:Enxin/MovieChat-1K_traindataset:ShareGPT4Video/ShareGPT4Videoarxiv:2504.05299base_model:HuggingFaceTB/SmolVLM-Instructbase_model:finetune:HuggingFaceTB/SmolVLM-Instructlicense:apache-2.0endpoints_compatibleregion:us
Use SmolVLM2-2.2B-Instruct on Mixpeek
Build multimodal processing pipelines with this model and others. Extract features, run inference, and set up retrieval, all through the Mixpeek pipeline builder.
Open Pipeline BuilderSpecification
OrganizationHuggingFaceTB
TaskImage Text To Text
Librarytransformers
Licenseapache-2.0
Downloads/mo144K
Likes314
View on HuggingFace
See model card, files, and community discussion