MongoDB
Add MongoDB sync capabability to mixpeek
Integrating MongoDB with Mixpeek
Pick an Embedding Model
This step involves selecting the most suitable embedding model for your project from a range of available options, each with its own characteristics and performance metrics.
Prepare MongoDB Cluster
Setting up a MongoDB Cluster on MongoDB Atlas provides a scalable and secure database environment for storing your application’s data.
Create Mixpeek Pipeline
In this phase, you’ll configure the Mixpeek pipeline, which is crucial for processing and integrating your data with the chosen embedding model.
Build your Apps
With the backend ready, you can now focus on developing your application’s frontend, ensuring users can interact smoothly with your service.
1. Pick an Embedding Model
- Pick an embedding model from the table below
Model Name | Dimensions | Max Tokens | MTEB Ranking |
---|---|---|---|
jinaai/jina-embeddings-v2-base-en | 768 | 8192 | 74.731 |
sentence-transformers/all-MiniLM-L6-v2 | 384 | TBA | TBA |
nomic-ai/nomic-embed-text-v1 | 512 | 8192 | TBA |
- Select an embedding model that suits your application’s needs.
Consult the Embedding Benchmarks scripts.
2. Create MongoDB Cluster via MongoDB Atlas
- Navigate to MongoDB Atlas.
- If you don’t already have an account, create one. Otherwise, sign in.
- Follow the prompts to create a new cluster.
3. Create Vector Search Index
- In MongoDB Atlas, navigate to
Atlas Search
- Choose
Create Search Index
and thenJSON Editor
. - Select the collection you wish to index.
- Enter the JSON configuration for your index. Replace
"test_embedding_768"
with the name of the field where your embeddings will be stored, and adjust"dimensions"
to match the dimensions of your chosen embedding model from the MTEB leaderboard.
Sample Index Configuration:
{
"fields":[
{
"type": "vector",
"path": "embedding_768",
"numDimensions": 768,
"similarity": "euclidean"
}
]
}
4. Create Mixpeek Pipeline
This is where all the processing logic lives. Much of this is default, opionated logic. If you need something custom, we reccommend using the Workflows
service, which can be called inside the pipeline.
curl --location 'https://api.mixpeek.com/pipelines' \
--header 'Authorization: Bearer API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"source_destination_mappings": [
{
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"source": {
"field": "resume_url",
"type": "file_url",
"settings": {}
},
"destination": {
"collection": "resume_embeddings",
"field": "text",
"embedding": "embedding"
}
}
]
}'
It automatically pulls from the connections you defined in the Update Connections method.
You can have multiple pipelines as objects in the source_destination_mappings
array.
5. Create Mongodb trigger
The trigger allows us to process changes in real-time using the pipeline we defined above
exports = async function(changeEvent) {
// Documentation on ChangeEvents: https://docs.mongodb.com/manual/reference/change-events/
const webhookUrl = `https://api.mixpeek.com/pipelines/${pipeline_id}`;
// Assuming changeEvent.fullDocument contains the document you want to send
try {
// Headers must be an object with string keys and string values
const headers = {
Authorization: [ "Bearer API_KEY" ],
"Content-Type": [ "application/json" ],
};
// The body of your request, turned into a JSON string
const body = JSON.stringify(changeEvent.fullDocument);
// Making the HTTP POST request
const response = await context.http.post({
url: webhookUrl, // The URL you are sending the request to
headers: headers, // The headers object
body: body, // The body of the request, which must be a string
encodeBodyAsJSON: true // This ensures the body is treated as JSON
});
// Logging the response status to verify successful delivery
console.log("Payload sent, received response status: ", response.status);
} catch(err) {
console.error("Error sending document payload: ", err.message);
}
};