1. Create a Bucket
2. Create a Collection
sentence– Best for Q&Aparagraph– Best for long-form contentfixed– Predictable token windows
3. Ingest Documents
4. Create a Retriever
5. Search
Hybrid Search (Vector + BM25)
Combine semantic (vector) and keyword (BM25) matching in one stage. BM25 is not a separate feature — setlexical: true on a search to match the query against the namespace’s full-text index instead of embedding it. Use rrf fusion so the score-scale mismatch between cosine similarity and BM25 doesn’t matter.
Lexical search requires a
text payload index on the field. See Text Indexes (BM25) and the Feature Search reference.Pre-Filter by Metadata
Filter before vector search for efficiency:Reranking
Use a cross-encoder for better accuracy:Model Options
| Model | Speed | Use Case |
|---|---|---|
multilingual-e5-base | Fast | High-volume |
multilingual-e5-large-instruct | Medium | General-purpose |
bge-large-en-v1.5 | Medium | English-only |
openai/text-embedding-3-large | Slow | Premium |
Next steps
Classify documents
Auto-categorize documents against a taxonomy of reference categories.
Discover topics
Cluster document embeddings to surface topic groups without predefined categories.
Get notified
Trigger alerts when new documents match a condition.
Schedule jobs
Re-cluster or re-enrich on a cron or interval as new content lands.

