Integrating MongoDB with Mixpeek

1

Pick an Embedding Model

This step involves selecting the most suitable embedding model for your project from a range of available options, each with its own characteristics and performance metrics.

2

Prepare MongoDB Cluster

Setting up a MongoDB Cluster on MongoDB Atlas provides a scalable and secure database environment for storing your application’s data.

3

Create Mixpeek Pipeline

In this phase, you’ll configure the Mixpeek pipeline, which is crucial for processing and integrating your data with the chosen embedding model.

4

Build your Apps

With the backend ready, you can now focus on developing your application’s frontend, ensuring users can interact smoothly with your service.

1. Pick an Embedding Model

  1. Pick an embedding model from the table below
Model NameDimensionsMax TokensMTEB Ranking
jinaai/jina-embeddings-v2-base-en768819274.731
sentence-transformers/all-MiniLM-L6-v2384TBATBA
nomic-ai/nomic-embed-text-v15128192TBA
  1. Select an embedding model that suits your application’s needs.

Consult the Embedding Benchmarks scripts.

2. Create MongoDB Cluster via MongoDB Atlas

  1. Navigate to MongoDB Atlas.
  2. If you don’t already have an account, create one. Otherwise, sign in.
  3. Follow the prompts to create a new cluster.

3. Create Vector Search Index

  1. In MongoDB Atlas, navigate to Atlas Search atlas search
  2. Choose Create Search Index and then JSON Editor. atlas search
  3. Select the collection you wish to index.
  4. Enter the JSON configuration for your index. Replace "test_embedding_768" with the name of the field where your embeddings will be stored, and adjust "dimensions" to match the dimensions of your chosen embedding model from the MTEB leaderboard. atlas search

Sample Index Configuration:

{
  "fields":[
    {
      "type": "vector",
      "path": "embedding_768",
      "numDimensions": 768,
      "similarity": "euclidean"
    }
  ]
}

4. Create Mixpeek Pipeline

This is where all the processing logic lives. Much of this is default, opionated logic. If you need something custom, we reccommend using the Workflows service, which can be called inside the pipeline.

curl --location 'https://api.mixpeek.com/pipelines' \
--header 'Authorization: Bearer API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "source_destination_mappings": [
        {
            "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
            "source": {
                "field": "resume_url",
                "type": "file_url",
                "settings": {}
            },
            "destination": {
                "collection": "resume_embeddings",
                "field": "text",
                "embedding": "embedding"
            }
        }
    ]
}'

It automatically pulls from the connections you defined in the Update Connections method. You can have multiple pipelines as objects in the source_destination_mappings array.

5. Create Mongodb trigger

The trigger allows us to process changes in real-time using the pipeline we defined above

exports = async function(changeEvent) {
  // Documentation on ChangeEvents: https://docs.mongodb.com/manual/reference/change-events/
  const webhookUrl = `https://api.mixpeek.com/pipelines/${pipeline_id}`;

  // Assuming changeEvent.fullDocument contains the document you want to send

  try {
    // Headers must be an object with string keys and string values
    const headers = {
        Authorization: [ "Bearer API_KEY" ],
       "Content-Type": [ "application/json" ],
      };

    // The body of your request, turned into a JSON string
    const body = JSON.stringify(changeEvent.fullDocument);

    // Making the HTTP POST request
    const response = await context.http.post({
      url: webhookUrl, // The URL you are sending the request to
      headers: headers, // The headers object
      body: body, // The body of the request, which must be a string
      encodeBodyAsJSON: true // This ensures the body is treated as JSON
    });

    // Logging the response status to verify successful delivery
    console.log("Payload sent, received response status: ", response.status);
  } catch(err) {
    console.error("Error sending document payload: ", err.message);
  }
};

6. Test Integration

Work in progress