Foundation models trained on large corpora, often extended to multimodal use with image or audio inputs.
Large Language Models (LLMs) are trained on vast amounts of text data to understand and generate human-like language. They can be extended to multimodal tasks by incorporating image or audio inputs, enabling cross-modal understanding and generation.
LLMs use transformer architectures to process and generate text. They can be fine-tuned for specific tasks or extended with additional modalities using techniques like cross-attention and multimodal embeddings.
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS