> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Test taxonomy configuration (validation only)

> ⚠️ VALIDATION ENDPOINT ONLY - Not for production enrichment!

This endpoint validates taxonomy configuration with 1-5 sample documents.
Results are returned immediately and NOT persisted to any collection.

❌ DO NOT USE FOR:
- Enriching entire collections (use taxonomy_applications instead)
- Batch processing documents (automatic during ingestion)
- Persisting enriched documents (use retriever pipelines instead)

✅ USE THIS FOR:
- Testing taxonomy configuration is correct
- Validating retriever finds matching taxonomy nodes
- Checking enrichment fields are properly applied
- Development/debugging taxonomy setup

📚 FOR PRODUCTION ENRICHMENT:

Automatic (during ingestion):
  1. Create taxonomy: POST /taxonomies
  2. Attach to collection: PUT /collections/{id} with taxonomy_applications field
  3. Ingest documents: Documents are automatically enriched by engine

On-the-fly (during retrieval):
  1. Add taxonomy_join stage to retriever pipeline
  2. Execute retriever: GET /retrievers/{id}/execute
  3. Results include enriched documents (not persisted)

See API documentation for Collections and Retrievers for details.


## OpenAPI

````yaml post /v1/taxonomies/execute/{taxonomy_identifier}
openapi: 3.1.0
info:
  title: Mixpeek API
  description: >-
    This is the Mixpeek API, providing access to various endpoints for data
    processing and retrieval.
  termsOfService: https://mixpeek.com/terms
  contact:
    name: Mixpeek Support
    url: https://mixpeek.com/contact
    email: info@mixpeek.com
  version: '0.82'
servers:
  - url: https://api.mixpeek.com
    description: Production
security: []
paths:
  /v1/taxonomies/execute/{taxonomy_identifier}:
    post:
      tags:
        - Taxonomies
      summary: Test taxonomy configuration (validation only)
      description: >-
        ⚠️ VALIDATION ENDPOINT ONLY - Not for production enrichment!


        This endpoint validates taxonomy configuration with 1-5 sample
        documents.

        Results are returned immediately and NOT persisted to any collection.


        ❌ DO NOT USE FOR:

        - Enriching entire collections (use taxonomy_applications instead)

        - Batch processing documents (automatic during ingestion)

        - Persisting enriched documents (use retriever pipelines instead)


        ✅ USE THIS FOR:

        - Testing taxonomy configuration is correct

        - Validating retriever finds matching taxonomy nodes

        - Checking enrichment fields are properly applied

        - Development/debugging taxonomy setup


        📚 FOR PRODUCTION ENRICHMENT:


        Automatic (during ingestion):
          1. Create taxonomy: POST /taxonomies
          2. Attach to collection: PUT /collections/{id} with taxonomy_applications field
          3. Ingest documents: Documents are automatically enriched by engine

        On-the-fly (during retrieval):
          1. Add taxonomy_join stage to retriever pipeline
          2. Execute retriever: GET /retrievers/{id}/execute
          3. Results include enriched documents (not persisted)

        See API documentation for Collections and Retrievers for details.
      operationId: execute_taxonomy_v1_taxonomies_execute__taxonomy_identifier__post
      parameters:
        - name: taxonomy_identifier
          in: path
          required: true
          schema:
            type: string
            description: Taxonomy ID or name to validate
            title: Taxonomy Identifier
          description: Taxonomy ID or name to validate
          example: tax_abc123
        - name: version
          in: query
          required: false
          schema:
            anyOf:
              - type: integer
              - type: 'null'
            description: Optional taxonomy version (defaults to latest)
            title: Version
          description: Optional taxonomy version (defaults to latest)
          example: 1
      requestBody:
        required: true
        content:
          application/json:
            schema:
              anyOf:
                - $ref: '#/components/schemas/ExecuteTaxonomyRequest'
                - additionalProperties: true
                  type: object
              title: Payload
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/JoinResponse'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '401':
          description: Unauthorized
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '403':
          description: Forbidden
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '404':
          description: Not Found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
components:
  schemas:
    ExecuteTaxonomyRequest:
      properties:
        taxonomy:
          $ref: '#/components/schemas/TaxonomyModel-Input'
          description: >-
            Full taxonomy model with configuration (fetched from DB by
            controller)
        retriever:
          anyOf:
            - $ref: '#/components/schemas/RetrieverModel-Input'
            - type: 'null'
          description: >-
            Optional retriever configuration override for testing. If omitted,
            uses the retriever configured in the taxonomy.
        source_documents:
          anyOf:
            - items:
                additionalProperties: true
                type: object
              type: array
            - type: 'null'
          title: Source Documents
          description: >-
            Sample documents to test enrichment (typically 1-5 docs). Results
            are returned immediately, not persisted. ⚠️ Do NOT pass
            collection_id expecting batch processing!
        source_collection_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Source Collection Id
          description: >-
            ⚠️ IGNORED IN ON_DEMAND MODE. This field exists for legacy
            compatibility only. To enrich collections, use taxonomy_applications
            on the collection.
        target_collection_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Target Collection Id
          description: >-
            ⚠️ IGNORED IN ON_DEMAND MODE. This field exists for legacy
            compatibility only. Results are never persisted via this endpoint.
        join_mode:
          $ref: '#/components/schemas/JoinMode'
          description: >-
            Must be 'on_demand'. BATCH mode is NOT supported via API. Batch
            enrichment is automatic (triggered by engine during ingestion).
          default: on_demand
        batch_size:
          type: integer
          maximum: 10000
          minimum: 1
          title: Batch Size
          description: Batch size for the scroll iterator
          default: 1000
        scroll_filters:
          anyOf:
            - $ref: '#/components/schemas/LogicalOperator-Input'
            - type: 'null'
          description: >-
            Additional filters applied to the source collection prior to
            enrichment.
      type: object
      required:
        - taxonomy
      title: ExecuteTaxonomyRequest
      description: >-
        Request model for on-demand taxonomy validation and testing ONLY.


        ⚠️ IMPORTANT: This endpoint is ONLY for testing taxonomy configuration
        with sample documents.


        DO NOT USE THIS FOR BATCH ENRICHMENT:

        ❌ Do NOT use this to enrich an entire collection

        ❌ Do NOT use source_collection_id expecting batch processing

        ❌ Do NOT use target_collection_id expecting persistence


        HOW TAXONOMY ENRICHMENT ACTUALLY WORKS:

        ✅ Automatic during ingestion: Attach taxonomies to collections via
        `taxonomy_applications`

        ✅ On-the-fly in retrieval: Add `taxonomy_join` stage to retriever
        pipelines


        This endpoint validates:

        - Taxonomy configuration is correct

        - Retriever can find matching taxonomy nodes

        - Enrichment fields are properly applied


        For production enrichment, see:

        - Collections API: attach taxonomies via `taxonomy_applications` field

        - Retrievers API: add `taxonomy_join` stage for on-the-fly enrichment
      examples:
        - batch_size: 1000
          join_mode: on_demand
          source_collection_id: col_catalog_v2
          target_collection_id: col_catalog_enriched_v2
          taxonomy:
            config:
              input_mappings:
                - input_key: image_vector
                  path: features.clip
                  source_type: vector
              retriever_id: ret_clip_v1
              source_collection:
                collection_id: col_products_v1
              taxonomy_type: flat
            input_mappings:
              - input_key: image_vector
                path: features.clip
                source_type: vector
            namespace_id: ns_123
            retriever_id: ret_clip_v1
            taxonomy_name: product_tags
    JoinResponse:
      properties:
        stats:
          $ref: '#/components/schemas/JoinStats'
        results:
          anyOf:
            - items:
                additionalProperties: true
                type: object
              type: array
            - type: 'null'
          title: Results
      type: object
      required:
        - stats
      title: JoinResponse
    ErrorResponse:
      properties:
        success:
          type: boolean
          title: Success
          description: Always false for error responses
          default: false
        status:
          type: integer
          title: Status
          description: HTTP status code for this error
        error:
          $ref: '#/components/schemas/ErrorDetail'
          description: Error details payload
      type: object
      required:
        - status
        - error
      title: ErrorResponse
      description: Error response model.
      examples:
        - error:
            details:
              id: ns_123
              resource: namespace
            message: Namespace not found
            type: NotFoundError
          status: 404
          success: false
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    TaxonomyModel-Input:
      properties:
        taxonomy_id:
          type: string
          title: Taxonomy Id
          description: Unique identifier for the taxonomy
        version:
          type: integer
          minimum: 1
          title: Version
          description: Monotonic version number of the taxonomy configuration
          default: 1
        taxonomy_name:
          type: string
          title: Taxonomy Name
          description: A unique name for the taxonomy within the namespace.
        description:
          anyOf:
            - type: string
            - type: 'null'
          title: Description
          description: Optional human-readable description.
        retriever_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Retriever Id
          description: Optional taxonomy-level retriever (prefer per-layer).
        input_mappings:
          anyOf:
            - items:
                $ref: '#/components/schemas/InputMapping'
              type: array
            - type: 'null'
          title: Input Mappings
          description: Optional taxonomy-level inputs (prefer per-layer).
        config:
          oneOf:
            - $ref: '#/components/schemas/FlatTaxonomyConfig-Input'
            - $ref: '#/components/schemas/HierarchicalTaxonomyConfig-Input'
          title: Config
          description: Configuration specific to the taxonomy type.
          discriminator:
            propertyName: taxonomy_type
            mapping:
              flat:
                $ref: '#/components/schemas/FlatTaxonomyConfig-Input'
              hierarchical:
                $ref: '#/components/schemas/HierarchicalTaxonomyConfig-Input'
        ready:
          type: boolean
          title: Ready
          description: >-
            Whether the taxonomy is ready for use. False for async inference
            (cluster/LLM) that needs processing. True for flat/explicit
            hierarchies.
          default: true
        created_at:
          type: string
          format: date-time
          title: Created At
          description: Creation timestamp for this taxonomy record
        metadata:
          additionalProperties: true
          type: object
          title: Metadata
          description: Additional user-defined metadata for the taxonomy
      type: object
      required:
        - taxonomy_name
        - config
      title: TaxonomyModel
      description: Primary Pydantic model representing a taxonomy definition.
      examples:
        - config:
            default_input_mappings:
              - input_key: image_vector
                path: features.clip_vit_l_14
                source_type: vector
            default_retriever_id: ret_clip_v1
            source_collection:
              collection_id: col_products_v1
            taxonomy_type: flat
          namespace_id: ns_123
          taxonomy_name: product_tags
          taxonomy_type: flat
        - config:
            build_mode: explicit
            default_input_mappings:
              - input_key: face_vec
                path: features.face
                source_type: vector
            default_retriever_id: ret_face_v1
            hierarchical_nodes:
              - collection_id: col_employees_v1
              - collection_id: col_executives_v1
                parent_collection_id: col_employees_v1
            taxonomy_type: hierarchical
          namespace_id: ns_123
          taxonomy_name: org_hierarchy
          taxonomy_type: hierarchical
    RetrieverModel-Input:
      properties:
        retriever_id:
          type: string
          title: Retriever Id
          description: Unique identifier for the retriever
        retriever_name:
          type: string
          title: Retriever Name
          description: Name of the retriever
        description:
          anyOf:
            - type: string
            - type: 'null'
          title: Description
          description: Description of the retriever
        visibility:
          $ref: '#/components/schemas/RetrieverVisibility'
          description: >-
            Visibility level: PRIVATE (owner only), PUBLIC (any valid API key),
            MARKETPLACE (requires subscription)
          default: private
        marketplace_listing_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Marketplace Listing Id
          description: Associated marketplace listing ID when visibility is MARKETPLACE
        requires_subscription:
          type: boolean
          title: Requires Subscription
          description: >-
            Whether this retriever requires an active subscription to access
            (marketplace only)
          default: false
        input_schema:
          $ref: '#/components/schemas/RetrieverSchema'
          description: Input schema for the retriever
        collection_ids:
          items:
            type: string
          type: array
          title: Collection Ids
          description: List of collection IDs
        stages:
          items:
            $ref: '#/components/schemas/StageInstanceConfig'
          type: array
          title: Stages
          description: List of stage configurations
        cache_config:
          anyOf:
            - $ref: '#/components/schemas/CacheConfig'
            - type: 'null'
          description: >-
            Cache configuration for this retriever. If not provided, caching is
            disabled.
        created_at:
          type: string
          format: date-time
          title: Created At
          description: When the retriever was created
        updated_at:
          type: string
          format: date-time
          title: Updated At
          description: When the retriever was last modified
        last_executed_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Last Executed At
          description: When the retriever was last executed
        enabled:
          type: boolean
          title: Enabled
          description: Whether the retriever is enabled (can be toggled on/off)
          default: true
        status:
          $ref: '#/components/schemas/RetrieverStatus'
          description: Current operational status
          default: active
        usage_stats:
          anyOf:
            - $ref: '#/components/schemas/UsageStatistics'
            - type: 'null'
          description: Usage and performance statistics
        collections:
          anyOf:
            - items:
                $ref: '#/components/schemas/CollectionDetail'
              type: array
            - type: 'null'
          title: Collections
          description: Expanded collection details with names and metadata
        metadata:
          additionalProperties: true
          type: object
          title: Metadata
          description: Custom key-value metadata
        tags:
          items:
            type: string
          type: array
          title: Tags
          description: Tags for organization and filtering
        created_by:
          anyOf:
            - $ref: '#/components/schemas/CreatorInfo'
            - type: 'null'
          description: Information about who created this retriever
        updated_by:
          anyOf:
            - $ref: '#/components/schemas/CreatorInfo'
            - type: 'null'
          description: Information about who last updated this retriever
        version:
          type: integer
          title: Version
          description: Version number (increments on each update)
          default: 1
        revision_history:
          items:
            $ref: '#/components/schemas/RevisionHistoryEntry'
          type: array
          title: Revision History
          description: History of changes (optional, last N changes)
        health:
          anyOf:
            - $ref: '#/components/schemas/HealthCheck'
            - type: 'null'
          description: Health status and diagnostics
      type: object
      required:
        - retriever_name
        - input_schema
        - collection_ids
        - stages
      title: RetrieverModel
      description: Retriever model.
    JoinMode:
      type: string
      enum:
        - on_demand
        - batch
      title: JoinMode
    LogicalOperator-Input:
      properties:
        AND:
          anyOf:
            - items:
                anyOf:
                  - $ref: '#/components/schemas/LogicalOperator-Input'
                  - $ref: '#/components/schemas/FilterCondition'
              type: array
            - type: 'null'
          title: And
          description: Logical AND operation - all conditions must be true
          example:
            - field: name
              operator: eq
              value: John
            - field: age
              operator: gte
              value: 30
        OR:
          anyOf:
            - items:
                anyOf:
                  - $ref: '#/components/schemas/LogicalOperator-Input'
                  - $ref: '#/components/schemas/FilterCondition'
              type: array
            - type: 'null'
          title: Or
          description: Logical OR operation - at least one condition must be true
          example:
            - field: status
              operator: eq
              value: active
            - field: role
              operator: eq
              value: admin
        NOT:
          anyOf:
            - items:
                anyOf:
                  - $ref: '#/components/schemas/LogicalOperator-Input'
                  - $ref: '#/components/schemas/FilterCondition'
              type: array
            - type: 'null'
          title: Not
          description: Logical NOT operation - all conditions must be false
          example:
            - field: department
              operator: eq
              value: HR
            - field: location
              operator: eq
              value: remote
        case_sensitive:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Case Sensitive
          description: Whether to perform case-sensitive matching
          default: false
          example: true
      additionalProperties: true
      type: object
      title: LogicalOperator
      description: >-
        Represents a logical operation (AND, OR, NOT) on filter conditions.


        Allows nesting with a defined depth limit.


        Also supports shorthand syntax where field names can be passed directly

        as key-value pairs for equality filtering (e.g., {"metadata.title":
        "value"}).
    JoinStats:
      properties:
        processed_docs:
          type: integer
          title: Processed Docs
          default: 0
        batches:
          type: integer
          title: Batches
          default: 0
        errors:
          type: integer
          title: Errors
          default: 0
      type: object
      title: JoinStats
    ErrorDetail:
      properties:
        message:
          type: string
          title: Message
          description: Human-readable error message
        type:
          type: string
          title: Type
          description: Stable error type identifier (machine-readable)
        code:
          anyOf:
            - type: string
            - type: 'null'
          title: Code
          description: >-
            Fine-grained error code for programmatic handling (e.g.,
            namespace_name_taken, feature_extractor_not_found). Present only
            when consumers may need to branch on a specific error condition.
        details:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Details
          description: >-
            Optional structured details to help debugging (validation errors,
            IDs, etc.)
      type: object
      required:
        - message
        - type
      title: ErrorDetail
      description: Error detail model.
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    InputMapping:
      properties:
        input_key:
          type: string
          title: Input Key
          description: Key used in the constructed inputs payload.
        source_type:
          anyOf:
            - $ref: '#/components/schemas/InputSourceType'
            - type: 'null'
          description: Source of the value (payload, literal, vector).
        path:
          anyOf:
            - type: string
            - type: 'null'
          title: Path
          description: >-
            Dot-notation path inside payload/vector when source_type is PAYLOAD
            or VECTOR.
        override:
          anyOf:
            - {}
            - type: 'null'
          title: Override
          description: Static value used when source_type is LITERAL. Overrides any path.
      type: object
      required:
        - input_key
      title: InputMapping
      description: |-
        Declarative mapping for building inputs from various sources.

        - input_key: The key used in the constructed inputs payload
        - source_type: Where to fetch the value (payload, literal, vector)
        - path: Dot-notation path when source_type is PAYLOAD or VECTOR
        - override: Static value when source_type is LITERAL
      examples:
        - input_key: query_text
          path: content.title
          source_type: payload
        - input_key: lang
          override: en
          source_type: literal
        - input_key: image_vector
          path: features.clip_vit_l_14
          source_type: vector
    FlatTaxonomyConfig-Input:
      properties:
        taxonomy_type:
          type: string
          const: flat
          title: Taxonomy Type
          description: Discriminator identifying this as a flat taxonomy.
          default: flat
        retriever_id:
          type: string
          title: Retriever Id
          description: The retriever to use for matching against the source collection.
        input_mappings:
          items:
            $ref: '#/components/schemas/InputMapping'
          type: array
          title: Input Mappings
          description: Input mappings defining how to construct retriever inputs.
        source_collection:
          $ref: '#/components/schemas/SourceCollection-Input'
          description: The single source collection for this flat taxonomy.
        step_analytics:
          anyOf:
            - $ref: '#/components/schemas/StepAnalyticsConfig-Input'
            - type: 'null'
          description: >-
            Optional configuration for step transition analytics. Enables
            tracking how documents progress through taxonomy labels over time
            (e.g., email thread progression from 'inquiry' to 'closed_won'). If
            not provided, only basic assignment events are logged.
      type: object
      required:
        - retriever_id
        - input_mappings
        - source_collection
      title: FlatTaxonomyConfig
      description: >-
        Configuration for a *flat* taxonomy - single source collection with one
        retriever.
      examples:
        - input_mappings:
            - input_key: image_vector
              path: features.clip_vit_l_14
              source_type: vector
          retriever_id: ret_clip_v1
          source_collection:
            collection_id: col_products_v1
            enrichment_fields:
              - field_path: metadata.tags
                merge_mode: append
          taxonomy_type: flat
    HierarchicalTaxonomyConfig-Input:
      properties:
        taxonomy_type:
          type: string
          const: hierarchical
          title: Taxonomy Type
          description: Discriminator identifying this as a hierarchical taxonomy.
          default: hierarchical
        retriever_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Retriever Id
          description: Default retriever to use for all nodes unless overridden per-node.
        input_mappings:
          anyOf:
            - items:
                $ref: '#/components/schemas/InputMapping'
              type: array
            - type: 'null'
          title: Input Mappings
          description: Default input mappings for all nodes unless overridden per-node.
        inference_strategy:
          anyOf:
            - $ref: '#/components/schemas/HierarchyInferenceStrategy'
            - type: 'null'
          description: >-
            Strategy for inferring hierarchy structure from collections. Can be
            'schema' (overlap-based), 'cluster' (clustering-based), or 'llm'
            (AI-based). When set, will infer relationships from
            inference_collections.
        inference_collections:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Inference Collections
          description: >-
            Collection IDs to use for hierarchy inference. The
            inference_strategy will analyze these collections to discover
            relationships. Can be combined with hierarchical_nodes for hybrid
            configuration.
        llm_provider:
          anyOf:
            - type: string
              const: openai_chat_v1
            - type: 'null'
          title: Llm Provider
          description: LLM provider to use for hierarchy inference (default openai_chat_v1)
        llm_model:
          anyOf:
            - type: string
            - type: 'null'
          title: Llm Model
          description: LLM model name (e.g., gpt-4o-mini)
        llm_prompt_template:
          anyOf:
            - type: string
            - type: 'null'
          title: Llm Prompt Template
          description: >-
            Optional prompt template. Variables available: {collection_id},
            {collection_name}.
        llm_sample_size:
          type: integer
          title: Llm Sample Size
          description: Optional number of sample docs to include in prompts (0 = disabled).
          default: 0
        cluster_ids:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Cluster Ids
          description: Cluster IDs to use for CLUSTER inference strategy
        cluster_overlap_threshold:
          type: number
          title: Cluster Overlap Threshold
          description: >-
            Minimum overlap ratio to establish parent-child relationship between
            clusters
          default: 0.7
        hierarchical_nodes:
          anyOf:
            - items:
                $ref: '#/components/schemas/HierarchicalNode-Input'
              type: array
            - type: 'null'
          title: Hierarchical Nodes
          description: >-
            Explicit node definitions that either: 1) Define the entire
            hierarchy (when inference_strategy is None), 2) Add additional nodes
            to an inferred hierarchy, or 3) Override specific relationships in
            an inferred hierarchy. Supports true hybrid: infer from some
            collections, manually add others.
        step_analytics:
          anyOf:
            - $ref: '#/components/schemas/StepAnalyticsConfig-Input'
            - type: 'null'
          description: >-
            Optional configuration for step transition analytics. Enables
            tracking how documents progress through hierarchical taxonomy nodes
            over time (e.g., content workflow tracking from 'draft' to
            'published'). If not provided, only basic assignment events are
            logged.
      type: object
      title: HierarchicalTaxonomyConfig
      description: >-
        Hybrid hierarchical taxonomy configuration supporting inference with
        manual additions.


        All hierarchical taxonomies are hybrid:

        - Base hierarchy can be inferred via schema, clustering, or LLM

        - Additional collections can be explicitly added with specific
        retrievers

        - Supports mixing inference strategies with manual additions/overrides


        Examples:

        1. Pure inference: Set inference_strategy + inference_collections

        2. Pure manual: Set hierarchical_nodes only

        3. Hybrid: Set inference_strategy + inference_collections +
        hierarchical_nodes
           (infers base from collections, adds/overrides with explicit nodes)
      examples:
        - hierarchical_nodes:
            - collection_id: col_employees_v1
            - collection_id: col_executives_v1
              parent_collection_id: col_employees_v1
          input_mappings:
            - input_key: face_vec
              path: features.face
              source_type: vector
          retriever_id: ret_face_v1
          taxonomy_type: hierarchical
        - inference_collections:
            - col_people_v1
            - col_employees_v1
            - col_managers_v1
          inference_strategy: schema
          input_mappings:
            - input_key: face_vec
              path: features.face
              source_type: vector
          retriever_id: ret_face_v1
          taxonomy_type: hierarchical
        - hierarchical_nodes:
            - collection_id: col_special_products
              enrichment_fields:
                - field_path: premium_tags
                  merge_mode: append
              parent_collection_id: col_products_v1
              retriever_id: ret_special_v1
          inference_collections:
            - col_products_v1
            - col_categories_v1
          inference_strategy: cluster
          retriever_id: ret_clip_v1
          taxonomy_type: hierarchical
    RetrieverVisibility:
      type: string
      enum:
        - private
        - public
        - marketplace
      title: RetrieverVisibility
      description: Visibility level of a retriever determining who can access it.
    RetrieverSchema:
      properties:
        properties:
          additionalProperties:
            $ref: '#/components/schemas/RetrieverInputSchemaField-Input'
          type: object
          title: Properties
          description: Schema properties for retriever inputs
      additionalProperties: true
      type: object
      title: RetrieverSchema
      description: Schema definition for retriever inputs.
    StageInstanceConfig:
      properties:
        stage_name:
          type: string
          title: Stage Name
        stage_id:
          anyOf:
            - type: string
            - type: 'null'
          title: Stage Id
          description: Stage implementation ID (overrides stage_name for lookups)
        parameters:
          additionalProperties: true
          type: object
          title: Parameters
          description: Stage parameters
        pre_filters:
          anyOf:
            - $ref: '#/components/schemas/LogicalOperator-Input'
            - type: 'null'
          description: >-
            Filters to apply to the documents *before* this stage is
            executed.These filters are combined with any global retriever
            filters.
        post_filters:
          anyOf:
            - $ref: '#/components/schemas/LogicalOperator-Input'
            - type: 'null'
          description: >-
            Filters to apply to the documents *after* this stage is
            executed.These filters are applied to the results of this stage
            before passing to the next.
        stats:
          anyOf:
            - $ref: '#/components/schemas/StagePerformance-Input'
            - type: 'null'
          description: Performance statistics for this stage
      type: object
      required:
        - stage_name
      title: StageInstanceConfig
      description: >-
        User-provided configuration for a stage instance in a retriever
        pipeline.


        This model is used when creating a retriever to define the specific

        parameters for each stage.
    CacheConfig:
      properties:
        enabled:
          type: boolean
          title: Enabled
          description: Whether caching is enabled for this retriever
          default: true
        ttl_seconds:
          type: integer
          minimum: 0
          title: Ttl Seconds
          description: 'Time-to-live for cached results in seconds. Default: 1 hour'
          default: 3600
        cache_stage_names:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Cache Stage Names
          description: >-
            List of stage names to cache results after. Stage names must match
            the stage_name field in the retriever's stages. If not specified,
            caches the final results after all stages. Examples:
            ['semantic_search'], ['semantic_search', 'rerank']
        exclude_fields:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Exclude Fields
          description: Fields to exclude from caching (e.g., PII fields)
        stats:
          anyOf:
            - $ref: '#/components/schemas/CacheStatistics'
            - type: 'null'
          description: Cache performance statistics
      type: object
      title: CacheConfig
      description: |-
        Configuration for retriever result caching.

        Controls how retriever results are cached to improve performance
        and reduce redundant compute.

        Caching can be configured at specific stages in the retriever pipeline.
        If no stages are specified, the final results are cached by default.
    RetrieverStatus:
      type: string
      enum:
        - active
        - draft
        - disabled
        - error
      title: RetrieverStatus
      description: Status of a retriever.
    UsageStatistics:
      properties:
        total_queries:
          type: integer
          title: Total Queries
          description: Total number of queries executed
          default: 0
        queries_last_24h:
          type: integer
          title: Queries Last 24H
          description: Number of queries in the last 24 hours
          default: 0
        avg_latency_ms:
          type: number
          title: Avg Latency Ms
          description: Average latency in milliseconds
          default: 0
        error_rate:
          type: number
          maximum: 1
          minimum: 0
          title: Error Rate
          description: Error rate as a fraction (0.0 - 1.0)
          default: 0
        last_error:
          anyOf:
            - type: string
            - type: 'null'
          title: Last Error
          description: Most recent error message for debugging
        cache_hit_rate:
          anyOf:
            - type: number
              maximum: 1
              minimum: 0
            - type: 'null'
          title: Cache Hit Rate
          description: Cache hit rate if caching is enabled (0.0 - 1.0)
      type: object
      title: UsageStatistics
      description: Usage statistics for a retriever.
    CollectionDetail:
      properties:
        collection_id:
          type: string
          title: Collection Id
          description: Collection identifier
        collection_name:
          type: string
          title: Collection Name
          description: Human-readable collection name
        document_count:
          anyOf:
            - type: integer
            - type: 'null'
          title: Document Count
          description: Number of documents in the collection
        enabled:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Enabled
          description: Whether the collection is active
        last_indexed_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Last Indexed At
          description: When the collection was last indexed
      type: object
      required:
        - collection_id
        - collection_name
      title: CollectionDetail
      description: Detailed information about a collection referenced by a retriever.
    CreatorInfo:
      properties:
        user_id:
          type: string
          title: User Id
          description: User identifier
        email:
          anyOf:
            - type: string
            - type: 'null'
          title: Email
          description: User email address
        name:
          anyOf:
            - type: string
            - type: 'null'
          title: Name
          description: User display name
      type: object
      required:
        - user_id
      title: CreatorInfo
      description: Information about who created or updated a resource.
    RevisionHistoryEntry:
      properties:
        version:
          type: integer
          title: Version
          description: Version number
        updated_at:
          type: string
          format: date-time
          title: Updated At
          description: When this version was created
        updated_by:
          anyOf:
            - type: string
            - type: 'null'
          title: Updated By
          description: User who made the change
        changes:
          anyOf:
            - type: string
            - type: 'null'
          title: Changes
          description: Description of changes made
      type: object
      required:
        - version
        - updated_at
      title: RevisionHistoryEntry
      description: A single entry in the revision history.
    HealthCheck:
      properties:
        status:
          $ref: '#/components/schemas/HealthStatus-Input'
          description: Current health status
          default: healthy
        last_check:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Last Check
          description: When the health was last checked
        issues:
          items:
            type: string
          type: array
          title: Issues
          description: List of current issues if any
      type: object
      title: HealthCheck
      description: Health check information for a retriever.
    FilterCondition:
      properties:
        field:
          type: string
          title: Field
          description: Field name to filter on
        operator:
          $ref: '#/components/schemas/FilterOperator'
          description: Comparison operator
          default: eq
        value:
          anyOf:
            - $ref: '#/components/schemas/DynamicValue'
            - {}
          title: Value
          description: Value to compare against
      type: object
      required:
        - field
        - value
      title: FilterCondition
      description: |-
        Represents a single filter condition.

        Attributes:
            field: The field to filter on
            operator: The comparison operator
            value: The value to compare against
    InputSourceType:
      type: string
      enum:
        - payload
        - literal
        - vector
        - blob
      title: InputSourceType
      description: Where the value for an input should be retrieved from.
    SourceCollection-Input:
      properties:
        collection_id:
          type: string
          title: Collection Id
          description: The ID of the source collection for the taxonomy.
        enrichment_fields:
          anyOf:
            - items:
                $ref: '#/components/schemas/EnrichmentField'
              type: array
            - type: 'null'
          title: Enrichment Fields
          description: >-
            Fields to copy from matched taxonomy node when enriching
            (append/replace semantics). If omitted, the full payload is copied.
      type: object
      required:
        - collection_id
      title: SourceCollection
      description: A source collection for a flat taxonomy.
      examples:
        - collection_id: col_products_v1
          enrichment_fields:
            - field_path: metadata.tags
              merge_mode: append
          input_mappings:
            - input_key: image_vector
              path: features.clip_vit_l_14
              source_type: vector
          retriever_id: ret_clip_v1
    StepAnalyticsConfig-Input:
      properties:
        timestamp_field:
          type: string
          title: Timestamp Field
          description: >-
            Document field containing event timestamp (e.g., 'Date',
            'created_at', 'metadata.timestamp')
          examples:
            - Date
            - metadata.timestamp
            - created_at
        sequence_id_field:
          type: string
          title: Sequence Id Field
          description: >-
            Document field that groups related items into a sequence (e.g.,
            'Thread-Index', 'session_id', 'user_id')
          examples:
            - Thread-Index
            - metadata.session_id
            - user_id
        step_key_source:
          $ref: '#/components/schemas/StepKeySource'
          description: >-
            How to determine the 'step' for each document (label, node_id, or
            custom field)
          default: assignment_label
        step_key_field_path:
          anyOf:
            - type: string
            - type: 'null'
          title: Step Key Field Path
          description: >-
            Required if step_key_source='field_path'. Dot-notation path to step
            value in document.
          examples:
            - metadata.workflow_stage
            - status
            - enrichments.stage
        covariates:
          items:
            $ref: '#/components/schemas/CovariateConfig'
          type: array
          maxItems: 20
          title: Covariates
          description: >-
            Predictor fields to analyze for conversion lift (categorical,
            numeric, embedding, cluster)
        max_sequence_duration_days:
          anyOf:
            - type: integer
              maximum: 365
              minimum: 1
            - type: 'null'
          title: Max Sequence Duration Days
          description: >-
            Maximum allowed duration for a sequence. Sequences beyond this are
            flagged as data quality issues.
      type: object
      required:
        - timestamp_field
        - sequence_id_field
      title: StepAnalyticsConfig
      description: >-
        Configuration for step-by-step transition analytics on taxonomy
        assignments.


        Enables analysis of how documents progress through taxonomy labels as a
        temporal

        sequence, answering questions like:

        - How long from "inquiry" to "closed_won"?

        - What % of "inquiry" emails reach "proposal"?

        - Which sender domains correlate with faster progression?


        Use Cases:
            1. Email Thread Analysis:
               - Track progression: inquiry → followup → proposal → closed_won
               - Identify which subject lines correlate with faster closure

            2. Content Workflow Tracking:
               - Monitor: draft → review → approved → published
               - Find bottlenecks and optimization opportunities

            3. Safety Compliance Monitoring:
               - Trace: violation_detected → investigated → resolved
               - Track resolution times and success rates

        Attributes:
            timestamp_field: Document field containing event timestamp
            sequence_id_field: Field that groups related documents into sequences
            step_key_source: How to extract the step identifier (label/node_id/custom field)
            step_key_field_path: Required if step_key_source='field_path'
            covariates: List of predictor variables to analyze for conversion lift
            max_sequence_duration_days: Filter out sequences longer than this (data quality)

        Example:
            ```python
            # Email thread analysis configuration
            StepAnalyticsConfig(
                timestamp_field="Date",  # Email timestamp
                sequence_id_field="Thread-Index",  # Groups emails in same thread
                step_key_source="assignment_label",  # Use taxonomy label as step
                covariates=[
                    CovariateConfig(
                        field_path="sender_domain",
                        covariate_type="categorical",
                        name="Sender Domain"
                    ),
                    CovariateConfig(
                        field_path="word_count",
                        covariate_type="numeric",
                        name="Email Length"
                    )
                ],
                max_sequence_duration_days=90  # Ignore threads >90 days
            )
            ```
    HierarchyInferenceStrategy:
      type: string
      enum:
        - schema
        - cluster
        - llm
      title: HierarchyInferenceStrategy
      description: >-
        Strategy for inferring the base hierarchy structure.


        Can be combined with manual overrides via hierarchical_nodes for hybrid
        configuration:

        - SCHEMA: Infer based on overlapping collection schemas

        - CLUSTER: Infer based on clustering algorithms and overlap detection

        - LLM: Infer using AI/language models
    HierarchicalNode-Input:
      properties:
        collection_id:
          type: string
          pattern: ^col_[a-zA-Z0-9_]+$
          title: Collection Id
          description: >-
            REQUIRED. Collection ID representing this node in the hierarchy.
            Must reference an existing collection containing documents for this
            hierarchy level. Format: 'col_' prefix followed by
            alphanumeric/underscore characters. Used to: Match documents against
            this level, identify node in path, store enrichment data. Example:
            'col_executives' for executive level, 'col_products_phones' for
            phones category.
          examples:
            - col_executives
            - col_managers
            - col_employees
            - col_products_phones
            - col_electronics
        parent_collection_id:
          anyOf:
            - type: string
              pattern: ^col_[a-zA-Z0-9_]+$
            - type: 'null'
          title: Parent Collection Id
          description: >-
            OPTIONAL. Collection ID of the parent node in the hierarchy. When
            None: This is a root node (top of hierarchy). When set: References
            parent node's collection_id, creating parent-child relationship.
            Format: Same as collection_id ('col_' prefix). Used to: Build
            hierarchy tree, determine inheritance order, construct path arrays.
            Example: 'col_managers' is parent of 'col_executives',
            'col_products' is parent of 'col_electronics'. Validation: Must
            reference a valid collection_id from another node in same taxonomy.
          examples:
            - col_employees
            - col_managers
            - col_products
            - null
        label:
          anyOf:
            - type: string
            - type: 'null'
          title: Label
          description: >-
            OPTIONAL. Human-readable display name for this hierarchy node. Used
            in UI, visualizations, and taxonomy assignment results. NOT REQUIRED
            - When None: collection name or auto-generated label may be used.
            Format: Free text, typically title case, 2-50 characters. Examples:
            'Executive Leadership', 'Mobile Phones', 'Engineering Team'. Can be
            LLM-generated or manually specified during taxonomy creation.
          examples:
            - Executive Leadership
            - Management Team
            - All Employees
            - Mobile Phones
            - Electronics
            - null
        summary:
          anyOf:
            - type: string
            - type: 'null'
          title: Summary
          description: >-
            OPTIONAL. Brief description of this hierarchy level and its
            contents. Used for: Documentation, UI tooltips, understanding
            hierarchy structure. NOT REQUIRED - When None: no summary available
            for this node. Format: Free text, typically 1-3 sentences, up to 500
            characters. Can be LLM-generated or manually provided.
          examples:
            - Top-level executives with strategic decision-making authority
            - Mid-level managers overseeing teams and projects
            - High-end smartphones with premium features
            - null
        keywords:
          anyOf:
            - items:
                type: string
              type: array
            - type: 'null'
          title: Keywords
          description: >-
            OPTIONAL. Keywords or tags describing this hierarchy level. Used
            for: Search, filtering, categorization, LLM understanding. NOT
            REQUIRED - When None: no keywords defined for this node. Format:
            List of strings, typically 3-10 keywords per node. Can be
            LLM-generated from collection contents or manually specified.
          examples:
            - - executive
              - leadership
              - C-suite
              - strategic
            - - manager
              - supervisor
              - team-lead
            - - smartphone
              - mobile
              - premium
              - flagship
            - null
        retriever_id:
          anyOf:
            - type: string
              pattern: ^ret_[a-zA-Z0-9_]+$
            - type: 'null'
          title: Retriever Id
          description: >-
            OPTIONAL. Retriever to use for matching documents at this hierarchy
            level. When None: Uses taxonomy-level retriever_id (inheritance from
            parent config). When set: Overrides taxonomy-level retriever for
            this specific node. Format: 'ret_' prefix followed by alphanumeric
            characters. Use for: Specialized matching at certain levels (e.g.,
            face recognition for employees, semantic search for products). Must
            reference an existing RetrieverModel.
          examples:
            - ret_face_recognition
            - ret_clip_products
            - ret_text_semantic
            - null
        enrichment_fields:
          anyOf:
            - items:
                $ref: '#/components/schemas/EnrichmentField'
              type: array
            - type: 'null'
          title: Enrichment Fields
          description: >-
            OPTIONAL. Fields to enrich into documents when they match this
            hierarchy level. Specifies which properties from node collection to
            copy to matched documents. When None: No field-level enrichment
            (only taxonomy assignment recorded). Format: List of EnrichmentField
            objects with field_path and merge_mode. Inheritance: Child nodes
            inherit all parent enrichment_fields plus their own. Example:
            executives node adds 'executive_level' on top of inherited
            'employee_id', 'department'.
          examples:
            - - field_path: executive_level
                merge_mode: replace
              - field_path: budget_authority
                merge_mode: replace
            - - field_path: product_category
                merge_mode: replace
              - field_path: tags
                merge_mode: append
            - null
        input_mappings:
          anyOf:
            - items:
                $ref: '#/components/schemas/InputMapping'
              type: array
            - type: 'null'
          title: Input Mappings
          description: >-
            OPTIONAL. Custom input mappings for the retriever at this hierarchy
            level. Specifies how to construct retriever inputs from document
            features. When None: Uses taxonomy-level input_mappings
            (inheritance). When set: Overrides taxonomy-level mappings for this
            specific node. Format: List of InputMapping objects specifying
            input_key, source_type, path. Use for: Different matching strategies
            at different levels (e.g., face at employee level, text at
            department level).
          examples:
            - - input_key: face_embedding
                path: features.face
                source_type: vector
            - - input_key: text
                path: description
                source_type: payload
            - null
      type: object
      required:
        - collection_id
      title: HierarchicalNode
      description: >-
        A node in a hierarchical taxonomy representing one level in the tree
        structure.


        Each node represents a collection containing documents at a specific
        hierarchy level.

        Nodes form parent-child relationships to create multi-level taxonomies
        with

        property inheritance. When a document matches a child node, it inherits
        all

        properties from parent nodes up to the root.


        Use Cases:
            - Define organizational hierarchies: employees → managers → executives
            - Create product categorizations: products → electronics → phones → smartphones
            - Build classification trees: industries → technology → software
            - Implement face recognition hierarchies: people → employees → team members
            - Enable property inheritance: child nodes get all parent properties

        Hierarchy Behavior:
            - Root nodes: parent_collection_id = None (top of hierarchy)
            - Child nodes: parent_collection_id references parent node's collection_id
            - Property inheritance: Children inherit all parent enrichment_fields
            - Path construction: Creates path array from root to leaf
            - Multi-level matching: Documents matched at deepest applicable level

        Configuration:
            - Per-node retrievers: Override taxonomy-level retriever for specific nodes
            - Per-node enrichment: Override which fields to enrich at each level
            - Per-node input mappings: Customize retriever inputs per hierarchy level
            - Labels/summaries: Human-readable metadata for UI display

        Related Models:
            - HierarchicalTaxonomyConfig: Contains list of hierarchical nodes
            - TaxonomyAssignment: Result of matching documents to nodes
            - EnrichmentField: Specifies which fields to enrich from node

        Requirements:
            - collection_id: REQUIRED - must reference an existing collection
            - parent_collection_id: REQUIRED for non-root nodes (None for root)
            - All other fields: OPTIONAL with inheritance from taxonomy-level config
      examples:
        - collection_id: col_employees
          description: Root node - All employees (no parent)
          enrichment_fields:
            - field_path: employee_id
              merge_mode: replace
            - field_path: department
              merge_mode: replace
          input_mappings:
            - input_key: face_embedding
              path: features.face
              source_type: vector
          keywords:
            - employee
            - staff
            - personnel
          label: All Employees
          retriever_id: ret_face_recognition
          summary: Complete employee directory with basic information
        - collection_id: col_managers
          description: Mid-level node - Managers (child of employees)
          enrichment_fields:
            - field_path: team_size
              merge_mode: replace
            - field_path: reports_to
              merge_mode: replace
          keywords:
            - manager
            - supervisor
            - team-lead
          label: Management Team
          parent_collection_id: col_employees
          summary: Managers overseeing teams and projects
        - collection_id: col_executives
          description: Leaf node - Executives (child of managers)
          enrichment_fields:
            - field_path: executive_level
              merge_mode: replace
            - field_path: budget_authority
              merge_mode: replace
          keywords:
            - executive
            - leadership
            - C-suite
          label: Executive Leadership
          parent_collection_id: col_managers
          summary: C-suite and VP-level executives with strategic authority
        - collection_id: col_products
          description: Product hierarchy - Root category
          enrichment_fields:
            - field_path: product_id
              merge_mode: replace
            - field_path: tags
              merge_mode: append
          keywords:
            - product
            - catalog
            - inventory
          label: All Products
          retriever_id: ret_clip_products
          summary: Complete product catalog
        - collection_id: col_electronics
          description: Product hierarchy - Electronics category
          enrichment_fields:
            - field_path: warranty_years
              merge_mode: replace
          keywords:
            - electronics
            - devices
            - tech
          label: Electronics
          parent_collection_id: col_products
          summary: Electronic devices and accessories
        - collection_id: col_test_category
          description: Minimal node - Only required fields
          parent_collection_id: col_test_root
    RetrieverInputSchemaField-Input:
      properties:
        type:
          $ref: '#/components/schemas/RetrieverInputSchemaFieldType'
        default:
          anyOf:
            - {}
            - type: 'null'
          title: Default
        items:
          anyOf:
            - $ref: '#/components/schemas/RetrieverInputSchemaField-Input'
            - type: 'null'
        properties:
          anyOf:
            - additionalProperties:
                $ref: '#/components/schemas/RetrieverInputSchemaField-Input'
              type: object
            - type: 'null'
          title: Properties
        examples:
          anyOf:
            - items: {}
              type: array
            - type: 'null'
          title: Examples
          description: >-
            OPTIONAL. List of example values for this field. Used by Apps to
            show example inputs in the UI. Provide multiple diverse examples
            when possible.
        description:
          anyOf:
            - type: string
            - type: 'null'
          title: Description
        enum:
          anyOf:
            - items: {}
              type: array
            - type: 'null'
          title: Enum
        required:
          anyOf:
            - type: boolean
            - type: 'null'
          title: Required
          default: false
      additionalProperties: true
      type: object
      required:
        - type
      title: RetrieverInputSchemaField
      description: >-
        Schema field definition for retriever input parameters.


        Identical structure to BucketSchemaField but uses
        RetrieverInputSchemaFieldType

        which includes additional reference types like document_reference.


        This allows retrievers to accept:

        1. Metadata inputs (strings, numbers, dates, etc.)

        2. File inputs (images, videos, documents for search)

        3. Reference inputs (document_reference for "find similar" queries)
    StagePerformance-Input:
      properties:
        avg_execution_ms:
          type: number
          title: Avg Execution Ms
          description: Average execution time in milliseconds
          default: 0
        execution_count:
          type: integer
          title: Execution Count
          description: Number of times executed
          default: 0
        error_count:
          type: integer
          title: Error Count
          description: Number of errors encountered
          default: 0
        last_executed_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Last Executed At
          description: Last time this stage was executed
      type: object
      title: StagePerformance
      description: Performance statistics for a retriever stage.
    CacheStatistics:
      properties:
        hit_count:
          type: integer
          title: Hit Count
          description: Number of cache hits
          default: 0
        miss_count:
          type: integer
          title: Miss Count
          description: Number of cache misses
          default: 0
        hit_rate:
          type: number
          maximum: 1
          minimum: 0
          title: Hit Rate
          description: Cache hit rate (0.0 - 1.0)
          default: 0
        size_bytes:
          type: integer
          title: Size Bytes
          description: Total size of cached data in bytes
          default: 0
        entry_count:
          type: integer
          title: Entry Count
          description: Number of entries in cache
          default: 0
        last_invalidated_at:
          anyOf:
            - type: string
              format: date-time
            - type: 'null'
          title: Last Invalidated At
          description: When the cache was last invalidated
      type: object
      title: CacheStatistics
      description: Statistics about cache performance.
    HealthStatus-Input:
      type: string
      enum:
        - healthy
        - degraded
        - unhealthy
      title: HealthStatus
      description: Health status of a retriever.
    FilterOperator:
      type: string
      enum:
        - eq
        - ne
        - gt
        - lt
        - gte
        - lte
        - in
        - nin
        - contains
        - starts_with
        - ends_with
        - regex
        - exists
        - is_null
        - text
        - phrase
      title: FilterOperator
      description: Supported filter operators across database implementations.
    DynamicValue:
      properties:
        type:
          type: string
          const: dynamic
          title: Type
          default: dynamic
        field:
          type: string
          title: Field
          description: >-
            The dot-notation path to the value in the runtime query request,
            e.g., 'inputs.user_id'
          examples:
            - inputs.query_text
            - filters.AND[0].value
      type: object
      required:
        - field
      title: DynamicValue
      description: A value that should be dynamically resolved from the query request.
    EnrichmentField:
      properties:
        field_path:
          type: string
          title: Field Path
          description: Dot-notation path of the field to copy from the taxonomy node.
        target_field:
          anyOf:
            - type: string
            - type: 'null'
          title: Target Field
          description: >-
            Optional target field name in the enriched document. If specified,
            the source field will be renamed to this name. If not specified, the
            field_path is used as the target name. Use this to rename fields
            during enrichment (e.g., label → visual_style).
          examples:
            - visual_style
            - style_description
            - brand_name
        merge_mode:
          $ref: '#/components/schemas/EnrichmentMergeMode'
          description: Whether to overwrite the target's value or append (for arrays).
          default: replace
      type: object
      required:
        - field_path
      title: EnrichmentField
      description: >-
        Field-level enrichment behaviour specification.


        Defines how to copy fields from taxonomy source nodes to enriched
        documents.

        Supports field renaming via target_field parameter.


        Examples:
            - Copy field as-is: {"field_path": "category", "merge_mode": "replace"}
            - Rename field: {"field_path": "label", "target_field": "visual_style", "merge_mode": "replace"}
            - Append to array: {"field_path": "tags", "merge_mode": "append"}
      examples:
        - field_path: metadata.tags
          merge_mode: append
        - field_path: category
          merge_mode: replace
        - field_path: label
          merge_mode: replace
          target_field: visual_style
        - field_path: description
          merge_mode: replace
          target_field: style_description
    StepKeySource:
      type: string
      enum:
        - assignment_label
        - assignment_node_id
        - field_path
      title: StepKeySource
      description: >-
        Defines how to extract the step key from documents for sequence
        analysis.


        The step key identifies which stage/state a document is in for
        transition analytics.


        Examples:
            ASSIGNMENT_LABEL: Use the taxonomy's assigned label (e.g., "inquiry", "proposal")
            ASSIGNMENT_NODE_ID: Use the taxonomy node ID (e.g., "node_sales_inquiry")
            FIELD_PATH: Use a custom document field (e.g., "metadata.workflow_stage")
    CovariateConfig:
      properties:
        field_path:
          type: string
          title: Field Path
          description: >-
            Dot-notation path to covariate field (e.g., 'sender_domain',
            'metadata.priority')
          examples:
            - sender_domain
            - metadata.word_count
            - features.clip_embedding
        covariate_type:
          $ref: '#/components/schemas/CovariateType'
          description: >-
            Type of covariate determines analysis strategy
            (categorical/numeric/embedding/cluster)
        name:
          type: string
          maxLength: 100
          minLength: 1
          title: Name
          description: Human-readable name for this covariate in analytics results
          examples:
            - Sender Domain
            - Word Count
            - Visual Cluster
        binning_strategy:
          anyOf:
            - type: string
              enum:
                - quartiles
                - deciles
                - custom
            - type: 'null'
          title: Binning Strategy
          description: >-
            How to bin numeric values for lift analysis (only used for NUMERIC
            type)
          default: quartiles
        clustering_method:
          anyOf:
            - type: string
              enum:
                - kmeans
                - hdbscan
            - type: 'null'
          title: Clustering Method
          description: >-
            Clustering algorithm for embedding analysis (only used for EMBEDDING
            type)
          default: kmeans
        n_clusters:
          anyOf:
            - type: integer
              maximum: 100
              minimum: 2
            - type: 'null'
          title: N Clusters
          description: >-
            Number of clusters for embedding-based predictors (only used for
            EMBEDDING type)
          default: 10
      type: object
      required:
        - field_path
        - covariate_type
        - name
      title: CovariateConfig
      description: >-
        Configuration for a single covariate/predictor variable in step
        analytics.


        Covariates are used to identify which features predict conversion from
        one step

        to another. The system computes "lift" for each covariate value, showing
        whether

        it increases or decreases conversion likelihood.


        Attributes:
            field_path: JSONPath to the field in document or metadata (e.g., "sender_domain")
            covariate_type: How to analyze this covariate (categorical, numeric, embedding, cluster)
            name: Human-readable name for analytics results
            binning_strategy: For NUMERIC types, how to bin values (quartiles, deciles)
            clustering_method: For EMBEDDING types, algorithm to use (kmeans, hdbscan)
            n_clusters: For EMBEDDING types, number of clusters to create

        Examples:
            ```python
            # Analyze sender domains (categorical)
            CovariateConfig(
                field_path="sender_domain",
                covariate_type="categorical",
                name="Email Domain"
            )

            # Analyze email length (numeric with quartile binning)
            CovariateConfig(
                field_path="word_count",
                covariate_type="numeric",
                name="Word Count",
                binning_strategy="quartiles"
            )

            # Analyze visual similarity (embedding clustering)
            CovariateConfig(
                field_path="features.clip_embedding",
                covariate_type="embedding",
                name="Visual Cluster",
                clustering_method="kmeans",
                n_clusters=10
            )
            ```
    RetrieverInputSchemaFieldType:
      type: string
      enum:
        - string
        - number
        - integer
        - float
        - boolean
        - object
        - array
        - date
        - datetime
        - text
        - image
        - audio
        - video
        - pdf
        - excel
        - document_reference
      title: RetrieverInputSchemaFieldType
      description: >-
        Supported data types for retriever input schema fields.


        Retriever input schemas define what parameters users can provide when
        executing

        a retriever. This includes all bucket schema types plus additional
        reference types.


        Types fall into three categories:


        1. **Metadata Types** (JSON types):
           - Standard JSON-compatible types
           - Examples: string, number, boolean, date
           - Inherited from BucketSchemaFieldType

        2. **File Types** (blobs):
           - Users can upload files/content as search inputs
           - Examples: text, image, video, pdf
           - Inherited from BucketSchemaFieldType

        3. **Reference Types** (structured metadata):
           - Special types for referencing existing documents
           - Examples: document_reference
           - Only available in retriever input schemas (NOT in bucket schemas)

        **DOCUMENT_REFERENCE Usage**:
            Accept document reference for "find similar" queries.

            Example - Find similar products retriever:
            {
                "reference_product": {
                    "type": "document_reference",
                    "description": "Find products similar to this one",
                    "required": true
                }
            }

            Execution input:
            {
                "inputs": {
                    "reference_product": {
                        "collection_id": "col_products",
                        "document_id": "doc_item_123"
                    }
                }
            }

            The system will use the pre-computed features from doc_item_123
            to find similar documents without re-processing.
    EnrichmentMergeMode:
      type: string
      enum:
        - replace
        - append
      title: EnrichmentMergeMode
      description: How a field from the taxonomy node should be merged into the target doc.
    CovariateType:
      type: string
      enum:
        - categorical
        - numeric
        - embedding
        - cluster_id
      title: CovariateType
      description: >-
        Type of covariate/predictor variable for conversion analysis.


        Different types enable different analysis strategies:

        - CATEGORICAL: String values, analyzed via grouping (e.g.,
        sender_domain, priority)

        - NUMERIC: Continuous values, binned into quartiles/deciles (e.g.,
        word_count, price)

        - EMBEDDING: Dense vectors, clustered for semantic analysis (e.g., CLIP
        embeddings)

        - CLUSTER_ID: Pre-computed cluster identifiers (e.g., topic_cluster,
        visual_cluster)


        Examples:
            ```python
            # Categorical: Which email domains convert better?
            CovariateConfig(field_path="sender_domain", covariate_type="categorical")

            # Numeric: Do longer emails convert faster?
            CovariateConfig(field_path="word_count", covariate_type="numeric")

            # Embedding: Do visually similar images follow similar paths?
            CovariateConfig(field_path="features.clip", covariate_type="embedding")

            # Cluster: Which topic clusters have highest conversion?
            CovariateConfig(field_path="metadata.topic_id", covariate_type="cluster_id")
            ```

````