A data structure that maps content tokens to the documents containing them, enabling fast full-text search. Inverted indices are the foundation of keyword search and complement vector-based retrieval in hybrid multimodal search systems.
An inverted index reverses the relationship between documents and terms. Instead of listing which terms appear in each document, it lists which documents contain each term. For a given search term, the index provides immediate access to all matching documents with their term positions and frequencies. Boolean operations (AND, OR, NOT) combine posting lists across terms for multi-term queries.
An inverted index consists of a dictionary (sorted list of all unique terms) and posting lists (sorted lists of document IDs containing each term). Posting lists may include term frequency, positions, and payloads. Compression techniques (variable-byte encoding, PFOR) reduce index size. Skip lists and block-max techniques accelerate query processing. Elasticsearch, Lucene, and Tantivy are widely used inverted index implementations.
Connect a bucket and Mixpeek runs the whole multimodal search pipeline for you: extraction, indexing, and search over your own objects. No models to wire up, nothing to host.
Start with ManagedKeep your embeddings on your own cloud and run dense, sparse, and BM25 search directly on object storage. First 1M vectors free.
Start with MVS