A neural network mechanism that dynamically computes relevance weights between input elements, allowing models to focus on the most informative parts. Attention is the core building block of transformer models that power modern multimodal AI.
Attention computes a weighted sum of value vectors, where the weights are determined by the compatibility between a query vector and key vectors. For each query, the mechanism scores every key to determine how much attention to pay to each corresponding value. This allows the model to dynamically focus on different input parts depending on context, rather than processing all information equally.
Scaled dot-product attention computes scores as Q*K^T/sqrt(d_k), followed by softmax normalization and multiplication with V. Multi-head attention runs multiple attention operations in parallel with different learned projections, capturing different types of relationships. Self-attention applies queries, keys, and values from the same input sequence. Cross-attention uses queries from one source and keys/values from another, enabling multimodal fusion.
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS