> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Execute Raw Inference

> Execute raw inference with provider+model or custom plugin.

This endpoint provides direct access to inference services without
the retriever framework overhead. Supports two modes:

1. **Provider + Model**: Use standard providers (openai, google, anthropic)
2. **Custom Plugin**: Use your custom inference plugins by inference_name

## Supported Providers

- **openai**: GPT models, embeddings, Whisper transcription
- **google**: Gemini models, Vertex multimodal embeddings (1408D)
- **anthropic**: Claude models

## Examples

### Custom Plugin (by inference_name)
```json
{
    "inference_name": "my_text_embedder_1_0_0",
    "inputs": {"text": "hello world"},
    "parameters": {}
}
```

### Custom Plugin (by feature_uri)
```json
{
    "feature_uri": "mixpeek://my_custom_embedder@1.0.0/embedding",
    "inputs": {"text": "hello world"},
    "parameters": {}
}
```

### Builtin Embedder (by feature_uri)
```json
{
    "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1",
    "inputs": {"text": "hello world"},
    "parameters": {}
}
```

### Chat Completion
```json
{
    "provider": "openai",
    "model": "gpt-4o-mini",
    "inputs": {"prompts": ["What is AI?"]},
    "parameters": {"temperature": 0.7, "max_tokens": 500}
}
```

### Text Embedding (OpenAI)
```json
{
    "provider": "openai",
    "model": "text-embedding-3-large",
    "inputs": {"text": "machine learning"},
    "parameters": {}
}
```

### Text Embedding (Google Vertex Multimodal - 1408D)
```json
{
    "provider": "google",
    "model": "multimodalembedding",
    "inputs": {"text": "machine learning"},
    "parameters": {}
}
```

### Image Embedding (Google Vertex Multimodal - 1408D)
```json
{
    "provider": "google",
    "model": "multimodalembedding",
    "inputs": {"image_url": "https://example.com/image.jpg"},
    "parameters": {}
}
```

### Image Embedding from Base64
```json
{
    "provider": "google",
    "model": "multimodalembedding",
    "inputs": {"image_base64": "<base64-encoded-image>"},
    "parameters": {}
}
```

### Video Embedding (Google Vertex Multimodal - 1408D)
```json
{
    "provider": "google",
    "model": "multimodalembedding",
    "inputs": {"video_url": "https://example.com/video.mp4"},
    "parameters": {}
}
```

### Video Embedding from Base64
```json
{
    "provider": "google",
    "model": "multimodalembedding",
    "inputs": {"video_base64": "<base64-encoded-video>"},
    "parameters": {}
}
```

### Audio Transcription
```json
{
    "provider": "openai",
    "model": "whisper-1",
    "inputs": {"audio_url": "https://example.com/audio.mp3"},
    "parameters": {}
}
```

### Vision (Multimodal LLM)
```json
{
    "provider": "openai",
    "model": "gpt-4o",
    "inputs": {
        "prompts": ["Describe this image"],
        "image_url": "https://example.com/image.jpg"
    },
    "parameters": {"temperature": 0.5}
}
```

Args:
    request: FastAPI request object (populated by middleware)
    payload: Raw inference request

Returns:
    Inference response with results and metadata

Raises:
    400 Bad Request: Invalid provider, model, or inputs
    401 Unauthorized: Missing or invalid API key
    429 Too Many Requests: Rate limit exceeded
    500 Internal Server Error: Inference execution failed



## OpenAPI

````yaml post /v1/inference
openapi: 3.1.0
info:
  title: Mixpeek API
  description: >-
    This is the Mixpeek API, providing access to various endpoints for data
    processing and retrieval.
  termsOfService: https://mixpeek.com/terms
  contact:
    name: Mixpeek Support
    url: https://mixpeek.com/contact
    email: info@mixpeek.com
  version: '0.82'
servers:
  - url: https://api.mixpeek.com
    description: Production
security: []
paths:
  /v1/inference:
    post:
      tags:
        - Inference
      summary: Execute Raw Inference
      description: >-
        Execute raw inference with provider+model or custom plugin.


        This endpoint provides direct access to inference services without

        the retriever framework overhead. Supports two modes:


        1. **Provider + Model**: Use standard providers (openai, google,
        anthropic)

        2. **Custom Plugin**: Use your custom inference plugins by
        inference_name


        ## Supported Providers


        - **openai**: GPT models, embeddings, Whisper transcription

        - **google**: Gemini models, Vertex multimodal embeddings (1408D)

        - **anthropic**: Claude models


        ## Examples


        ### Custom Plugin (by inference_name)

        ```json

        {
            "inference_name": "my_text_embedder_1_0_0",
            "inputs": {"text": "hello world"},
            "parameters": {}
        }

        ```


        ### Custom Plugin (by feature_uri)

        ```json

        {
            "feature_uri": "mixpeek://my_custom_embedder@1.0.0/embedding",
            "inputs": {"text": "hello world"},
            "parameters": {}
        }

        ```


        ### Builtin Embedder (by feature_uri)

        ```json

        {
            "feature_uri": "mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1",
            "inputs": {"text": "hello world"},
            "parameters": {}
        }

        ```


        ### Chat Completion

        ```json

        {
            "provider": "openai",
            "model": "gpt-4o-mini",
            "inputs": {"prompts": ["What is AI?"]},
            "parameters": {"temperature": 0.7, "max_tokens": 500}
        }

        ```


        ### Text Embedding (OpenAI)

        ```json

        {
            "provider": "openai",
            "model": "text-embedding-3-large",
            "inputs": {"text": "machine learning"},
            "parameters": {}
        }

        ```


        ### Text Embedding (Google Vertex Multimodal - 1408D)

        ```json

        {
            "provider": "google",
            "model": "multimodalembedding",
            "inputs": {"text": "machine learning"},
            "parameters": {}
        }

        ```


        ### Image Embedding (Google Vertex Multimodal - 1408D)

        ```json

        {
            "provider": "google",
            "model": "multimodalembedding",
            "inputs": {"image_url": "https://example.com/image.jpg"},
            "parameters": {}
        }

        ```


        ### Image Embedding from Base64

        ```json

        {
            "provider": "google",
            "model": "multimodalembedding",
            "inputs": {"image_base64": "<base64-encoded-image>"},
            "parameters": {}
        }

        ```


        ### Video Embedding (Google Vertex Multimodal - 1408D)

        ```json

        {
            "provider": "google",
            "model": "multimodalembedding",
            "inputs": {"video_url": "https://example.com/video.mp4"},
            "parameters": {}
        }

        ```


        ### Video Embedding from Base64

        ```json

        {
            "provider": "google",
            "model": "multimodalembedding",
            "inputs": {"video_base64": "<base64-encoded-video>"},
            "parameters": {}
        }

        ```


        ### Audio Transcription

        ```json

        {
            "provider": "openai",
            "model": "whisper-1",
            "inputs": {"audio_url": "https://example.com/audio.mp3"},
            "parameters": {}
        }

        ```


        ### Vision (Multimodal LLM)

        ```json

        {
            "provider": "openai",
            "model": "gpt-4o",
            "inputs": {
                "prompts": ["Describe this image"],
                "image_url": "https://example.com/image.jpg"
            },
            "parameters": {"temperature": 0.5}
        }

        ```


        Args:
            request: FastAPI request object (populated by middleware)
            payload: Raw inference request

        Returns:
            Inference response with results and metadata

        Raises:
            400 Bad Request: Invalid provider, model, or inputs
            401 Unauthorized: Missing or invalid API key
            429 Too Many Requests: Rate limit exceeded
            500 Internal Server Error: Inference execution failed
      operationId: execute_raw_inference_v1_inference_post
      parameters: []
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/RawInferenceRequest'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/RawInferenceResponse'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '401':
          description: Unauthorized
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '403':
          description: Forbidden
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '404':
          description: Not Found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
components:
  schemas:
    RawInferenceRequest:
      properties:
        provider:
          anyOf:
            - type: string
            - type: 'null'
          title: Provider
          description: >-
            Provider name: openai, google, anthropic (required if inference_name
            not set)
          examples:
            - openai
            - google
            - anthropic
        model:
          anyOf:
            - type: string
            - type: 'null'
          title: Model
          description: >-
            Model identifier specific to the provider (required if
            inference_name not set)
          examples:
            - gpt-4o-mini
            - gemini-1.5-flash
            - claude-3-5-sonnet
            - text-embedding-3-large
            - whisper-1
        inference_name:
          anyOf:
            - type: string
            - type: 'null'
          title: Inference Name
          description: Custom plugin inference name (alternative to provider+model)
          examples:
            - my_text_embedder_1_0_0
            - custom_reranker_2_0_0
        feature_uri:
          anyOf:
            - type: string
            - type: 'null'
          title: Feature Uri
          description: >-
            Feature URI to resolve to inference_name (alternative to
            inference_name). Format:
            mixpeek://{extractor}@{version}/{vector_index_name}
          examples:
            - mixpeek://text_extractor@v1/multilingual_e5_large_instruct_v1
            - mixpeek://my_custom_embedder@1.0.0/embedding
        inputs:
          additionalProperties: true
          type: object
          title: Inputs
          description: >-
            Model-specific inputs. Chat: {prompts: [str]}, Embeddings: {text:
            str} or {texts: [str]}, Transcription: {audio_url: str}, Vision:
            {prompts: [str], image_url: str}
          examples:
            - prompts:
                - What is the capital of France?
            - text: machine learning
            - audio_url: https://example.com/audio.mp3
        parameters:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Parameters
          description: >-
            Optional parameters for inference. Common: temperature (float),
            max_tokens (int), schema (dict for structured output)
          examples:
            - max_tokens: 500
              temperature: 0.7
        enable_semantic_cache:
          type: boolean
          title: Enable Semantic Cache
          description: >-
            Enable semantic caching (vCache) for LLM chat operations. When
            enabled, semantically similar prompts may return cached responses,
            reducing latency and cost. Only applies to chat/completion models.
          default: false
        cache_delta:
          anyOf:
            - type: number
              maximum: 1
              minimum: 0
            - type: 'null'
          title: Cache Delta
          description: >-
            Maximum error rate for semantic cache (0.0-1.0). Lower values are
            more conservative. Default uses system setting (0.02 = 2%).
      type: object
      required:
        - inputs
      title: RawInferenceRequest
      description: >-
        Request for raw inference without retriever framework.


        This endpoint provides direct access to inference services with minimal
        configuration.

        Ideal for simple LLM calls, embeddings, transcription, or vision tasks
        without

        requiring collection setup or retriever configuration.


        You can either use:

        - `provider` + `model` for standard providers (openai, google,
        anthropic)

        - `inference_name` for custom plugins


        Examples:
            # Chat completion (provider + model)
            {
                "provider": "openai",
                "model": "gpt-4o-mini",
                "inputs": {"prompts": ["What is AI?"]},
                "parameters": {"temperature": 0.7, "max_tokens": 500}
            }

            # Text embedding (provider + model)
            {
                "provider": "openai",
                "model": "text-embedding-3-large",
                "inputs": {"text": "machine learning"},
                "parameters": {}
            }

            # Custom plugin (inference_name)
            {
                "inference_name": "my_text_embedder_1_0_0",
                "inputs": {"text": "hello world"},
                "parameters": {}
            }

            # Audio transcription
            {
                "provider": "openai",
                "model": "whisper-1",
                "inputs": {"audio_url": "https://example.com/audio.mp3"},
                "parameters": {}
            }

            # Vision (multimodal)
            {
                "provider": "openai",
                "model": "gpt-4o",
                "inputs": {
                    "prompts": ["Describe this image"],
                    "image_url": "https://example.com/image.jpg"
                },
                "parameters": {"temperature": 0.5}
            }
    RawInferenceResponse:
      properties:
        data:
          title: Data
          description: Inference results (structure varies by modality)
        provider:
          type: string
          title: Provider
          description: Provider that was used
        model:
          type: string
          title: Model
          description: Model that was used
        tokens_used:
          anyOf:
            - additionalProperties:
                type: integer
              type: object
            - type: 'null'
          title: Tokens Used
          description: Token usage statistics (if available)
          examples:
            - completion: 120
              prompt: 15
              total: 135
        latency_ms:
          type: number
          title: Latency Ms
          description: Total inference latency in milliseconds
        cached:
          type: boolean
          title: Cached
          description: Whether the response was served from semantic cache (vCache)
          default: false
      type: object
      required:
        - data
        - provider
        - model
        - latency_ms
      title: RawInferenceResponse
      description: |-
        Response from raw inference.

        Returns the inference results along with metadata about the request.
    ErrorResponse:
      properties:
        success:
          type: boolean
          title: Success
          description: Always false for error responses
          default: false
        status:
          type: integer
          title: Status
          description: HTTP status code for this error
        error:
          $ref: '#/components/schemas/ErrorDetail'
          description: Error details payload
      type: object
      required:
        - status
        - error
      title: ErrorResponse
      description: Error response model.
      examples:
        - error:
            details:
              id: ns_123
              resource: namespace
            message: Namespace not found
            type: NotFoundError
          status: 404
          success: false
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    ErrorDetail:
      properties:
        message:
          type: string
          title: Message
          description: Human-readable error message
        type:
          type: string
          title: Type
          description: Stable error type identifier (machine-readable)
        code:
          anyOf:
            - type: string
            - type: 'null'
          title: Code
          description: >-
            Fine-grained error code for programmatic handling (e.g.,
            namespace_name_taken, feature_extractor_not_found). Present only
            when consumers may need to branch on a specific error condition.
        details:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Details
          description: >-
            Optional structured details to help debugging (validation errors,
            IDs, etc.)
      type: object
      required:
        - message
        - type
      title: ErrorDetail
      description: Error detail model.
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError

````