> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mixpeek.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Compute step transition analytics

> Analyze how documents progress from one taxonomy step to another.

This endpoint computes conversion rates, duration statistics, and predictor lifts
for documents transitioning between taxonomy labels.

## Use Cases

**Email Thread Analysis:**
- Question: How long from "inquiry" to "closed_won"?
- Question: What % of inquiries result in sales?
- Question: Which sender domains have highest conversion?

**Content Workflow Tracking:**
- Question: Conversion rate from "draft" to "published"?
- Question: How long does content stay in review?
- Question: Which authors publish fastest?

**Safety Compliance Monitoring:**
- Question: Time from violation detection to resolution?
- Question: Success rate for remediation efforts?

## Requirements

- Taxonomy must have `step_analytics` configured (or provide `override_step_analytics`)
- Collection must contain documents enriched with this taxonomy
- Documents must have timestamp and sequence grouping fields configured

## Returns

**Conversion Metrics:**
- `count`: Total sequences starting at from_step
- `converted`: Number reaching to_step
- `conversion_rate`: Percentage that converted

**Duration Statistics (if converted > 0):**
- `mean`, `median`: Average and middle duration
- `p90`, `p95`: 90th and 95th percentile durations
- `std_dev`, `min`, `max`: Distribution statistics

**Top Predictors:**
- Covariates with highest impact on conversion
- Lift values (>1.0 = increases conversion, <1.0 = decreases)
- Statistical significance via minimum support threshold

## Example Request

```json
{
    "collection_id": "col_emails",
    "taxonomy_id": "tax_sales_stages",
    "from_step": "inquiry",
    "to_step": "closed_won",
    "max_window_days": 90,
    "min_support": 10
}
```

## Example Response

```json
{
    "from_step": "inquiry",
    "to_step": "closed_won",
    "count": 1000,
    "converted": 350,
    "conversion_rate": 0.35,
    "durations_sec": {
        "mean": 432000.0,
        "median": 345600.0,
        "p50": 345600.0,
        "p90": 691200.0,
        "p95": 864000.0
    },
    "top_predictors": [
        {
            "field": "Sender Domain",
            "value": "enterprise.com",
            "count": 150,
            "conversion_rate": 0.75,
            "lift": 2.14
        }
    ]
}
```



## OpenAPI

````yaml post /v1/taxonomies/{taxonomy_id}/analytics/transitions
openapi: 3.1.0
info:
  title: Mixpeek API
  description: >-
    This is the Mixpeek API, providing access to various endpoints for data
    processing and retrieval.
  termsOfService: https://mixpeek.com/terms
  contact:
    name: Mixpeek Support
    url: https://mixpeek.com/contact
    email: info@mixpeek.com
  version: '0.82'
servers:
  - url: https://api.mixpeek.com
    description: Production
security: []
paths:
  /v1/taxonomies/{taxonomy_id}/analytics/transitions:
    post:
      tags:
        - Taxonomy Analytics
      summary: Compute step transition analytics
      description: >-
        Analyze how documents progress from one taxonomy step to another.


        This endpoint computes conversion rates, duration statistics, and
        predictor lifts

        for documents transitioning between taxonomy labels.


        ## Use Cases


        **Email Thread Analysis:**

        - Question: How long from "inquiry" to "closed_won"?

        - Question: What % of inquiries result in sales?

        - Question: Which sender domains have highest conversion?


        **Content Workflow Tracking:**

        - Question: Conversion rate from "draft" to "published"?

        - Question: How long does content stay in review?

        - Question: Which authors publish fastest?


        **Safety Compliance Monitoring:**

        - Question: Time from violation detection to resolution?

        - Question: Success rate for remediation efforts?


        ## Requirements


        - Taxonomy must have `step_analytics` configured (or provide
        `override_step_analytics`)

        - Collection must contain documents enriched with this taxonomy

        - Documents must have timestamp and sequence grouping fields configured


        ## Returns


        **Conversion Metrics:**

        - `count`: Total sequences starting at from_step

        - `converted`: Number reaching to_step

        - `conversion_rate`: Percentage that converted


        **Duration Statistics (if converted > 0):**

        - `mean`, `median`: Average and middle duration

        - `p90`, `p95`: 90th and 95th percentile durations

        - `std_dev`, `min`, `max`: Distribution statistics


        **Top Predictors:**

        - Covariates with highest impact on conversion

        - Lift values (>1.0 = increases conversion, <1.0 = decreases)

        - Statistical significance via minimum support threshold


        ## Example Request


        ```json

        {
            "collection_id": "col_emails",
            "taxonomy_id": "tax_sales_stages",
            "from_step": "inquiry",
            "to_step": "closed_won",
            "max_window_days": 90,
            "min_support": 10
        }

        ```


        ## Example Response


        ```json

        {
            "from_step": "inquiry",
            "to_step": "closed_won",
            "count": 1000,
            "converted": 350,
            "conversion_rate": 0.35,
            "durations_sec": {
                "mean": 432000.0,
                "median": 345600.0,
                "p50": 345600.0,
                "p90": 691200.0,
                "p95": 864000.0
            },
            "top_predictors": [
                {
                    "field": "Sender Domain",
                    "value": "enterprise.com",
                    "count": 150,
                    "conversion_rate": 0.75,
                    "lift": 2.14
                }
            ]
        }

        ```
      operationId: >-
        compute_step_transitions_v1_taxonomies__taxonomy_id__analytics_transitions_post
      parameters:
        - name: taxonomy_id
          in: path
          required: true
          schema:
            type: string
            title: Taxonomy Id
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/StepTransitionRequest'
      responses:
        '200':
          description: Successful Response
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/StepTransitionResponse'
        '400':
          description: Bad Request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '401':
          description: Unauthorized
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '403':
          description: Forbidden
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '404':
          description: Not Found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '422':
          description: Validation Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/HTTPValidationError'
        '500':
          description: Internal Server Error
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
components:
  schemas:
    StepTransitionRequest:
      properties:
        collection_id:
          type: string
          title: Collection Id
          description: Collection to analyze for step transitions
        taxonomy_id:
          type: string
          title: Taxonomy Id
          description: Taxonomy ID (each taxonomy_id is immutable, clone creates new ID)
        from_step:
          type: string
          title: From Step
          description: Starting step label (e.g., 'inquiry', 'draft')
          examples:
            - inquiry
            - draft
            - violation_detected
        to_step:
          type: string
          title: To Step
          description: Ending step label (e.g., 'closed_won', 'published')
          examples:
            - closed_won
            - published
            - resolved
        max_window_days:
          anyOf:
            - type: integer
              maximum: 365
              minimum: 1
            - type: 'null'
          title: Max Window Days
          description: >-
            Maximum days between from_step and to_step. Sequences exceeding this
            are excluded.
        filters:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Filters
          description: 'Optional filters for events (e.g., {''metadata.region'': ''US''})'
        override_step_analytics:
          anyOf:
            - $ref: '#/components/schemas/StepAnalyticsConfig-Input'
            - type: 'null'
          description: Override taxonomy's default step_analytics config for this query
        min_support:
          type: integer
          minimum: 1
          title: Min Support
          description: Minimum number of sequences required for valid analysis
          default: 10
      type: object
      required:
        - collection_id
        - taxonomy_id
        - from_step
        - to_step
      title: StepTransitionRequest
      description: >-
        API request model for step transition analytics.


        This model extends the engine query model with API-specific validation

        and documentation.


        Use this to analyze how documents transition from one taxonomy step to
        another,

        computing conversion rates, durations, and predictor lifts.


        Example:
            ```json
            {
                "collection_id": "col_emails",
                "taxonomy_id": "tax_sales_stages",
                "from_step": "inquiry",
                "to_step": "closed_won",
                "max_window_days": 90,
                "min_support": 10
            }
            ```

        Response includes:
            - Conversion rate (% reaching to_step)
            - Duration statistics (mean, median, p90, p95)
            - Top predictors (covariates with highest lift)
    StepTransitionResponse:
      properties:
        from_step:
          type: string
          title: From Step
          description: Starting step
        to_step:
          type: string
          title: To Step
          description: Ending step
        count:
          type: integer
          minimum: 0
          title: Count
          description: Total number of sequences starting at from_step
        converted:
          type: integer
          minimum: 0
          title: Converted
          description: Number of sequences that reached to_step
        conversion_rate:
          type: number
          maximum: 1
          minimum: 0
          title: Conversion Rate
          description: Percentage that converted (converted / count)
        durations_sec:
          anyOf:
            - $ref: '#/components/schemas/DurationStats'
            - type: 'null'
          description: Duration statistics (None if no conversions)
        top_predictors:
          items:
            $ref: '#/components/schemas/PredictorLift'
          type: array
          maxItems: 50
          title: Top Predictors
          description: Covariates with highest lift (sorted by absolute lift)
        metadata:
          additionalProperties: true
          type: object
          title: Metadata
          description: Additional metadata (collection_id, event counts, etc.)
      type: object
      required:
        - from_step
        - to_step
        - count
        - converted
        - conversion_rate
      title: StepTransitionResponse
      description: |-
        API response model for step transition analytics.

        Contains comprehensive statistics about the A→B transition including
        conversion metrics, duration analysis, and predictor insights.

        Example Response:
            ```json
            {
                "from_step": "inquiry",
                "to_step": "closed_won",
                "count": 1000,
                "converted": 350,
                "conversion_rate": 0.35,
                "durations_sec": {
                    "mean": 432000.0,
                    "median": 345600.0,
                    "p50": 345600.0,
                    "p90": 691200.0,
                    "p95": 864000.0,
                    "std_dev": 172800.0,
                    "min": 86400.0,
                    "max": 1209600.0
                },
                "top_predictors": [
                    {
                        "field": "Sender Domain",
                        "value": "enterprise.com",
                        "count": 150,
                        "conversion_rate": 0.75,
                        "lift": 2.14
                    }
                ],
                "metadata": {
                    "collection_id": "col_emails",
                    "taxonomy_id": "tax_sales_stages",
                    "total_events_analyzed": 5432
                }
            }
            ```
    ErrorResponse:
      properties:
        success:
          type: boolean
          title: Success
          description: Always false for error responses
          default: false
        status:
          type: integer
          title: Status
          description: HTTP status code for this error
        error:
          $ref: '#/components/schemas/ErrorDetail'
          description: Error details payload
      type: object
      required:
        - status
        - error
      title: ErrorResponse
      description: Error response model.
      examples:
        - error:
            details:
              id: ns_123
              resource: namespace
            message: Namespace not found
            type: NotFoundError
          status: 404
          success: false
    HTTPValidationError:
      properties:
        detail:
          items:
            $ref: '#/components/schemas/ValidationError'
          type: array
          title: Detail
      type: object
      title: HTTPValidationError
    StepAnalyticsConfig-Input:
      properties:
        timestamp_field:
          type: string
          title: Timestamp Field
          description: >-
            Document field containing event timestamp (e.g., 'Date',
            'created_at', 'metadata.timestamp')
          examples:
            - Date
            - metadata.timestamp
            - created_at
        sequence_id_field:
          type: string
          title: Sequence Id Field
          description: >-
            Document field that groups related items into a sequence (e.g.,
            'Thread-Index', 'session_id', 'user_id')
          examples:
            - Thread-Index
            - metadata.session_id
            - user_id
        step_key_source:
          $ref: '#/components/schemas/StepKeySource'
          description: >-
            How to determine the 'step' for each document (label, node_id, or
            custom field)
          default: assignment_label
        step_key_field_path:
          anyOf:
            - type: string
            - type: 'null'
          title: Step Key Field Path
          description: >-
            Required if step_key_source='field_path'. Dot-notation path to step
            value in document.
          examples:
            - metadata.workflow_stage
            - status
            - enrichments.stage
        covariates:
          items:
            $ref: '#/components/schemas/CovariateConfig'
          type: array
          maxItems: 20
          title: Covariates
          description: >-
            Predictor fields to analyze for conversion lift (categorical,
            numeric, embedding, cluster)
        max_sequence_duration_days:
          anyOf:
            - type: integer
              maximum: 365
              minimum: 1
            - type: 'null'
          title: Max Sequence Duration Days
          description: >-
            Maximum allowed duration for a sequence. Sequences beyond this are
            flagged as data quality issues.
      type: object
      required:
        - timestamp_field
        - sequence_id_field
      title: StepAnalyticsConfig
      description: >-
        Configuration for step-by-step transition analytics on taxonomy
        assignments.


        Enables analysis of how documents progress through taxonomy labels as a
        temporal

        sequence, answering questions like:

        - How long from "inquiry" to "closed_won"?

        - What % of "inquiry" emails reach "proposal"?

        - Which sender domains correlate with faster progression?


        Use Cases:
            1. Email Thread Analysis:
               - Track progression: inquiry → followup → proposal → closed_won
               - Identify which subject lines correlate with faster closure

            2. Content Workflow Tracking:
               - Monitor: draft → review → approved → published
               - Find bottlenecks and optimization opportunities

            3. Safety Compliance Monitoring:
               - Trace: violation_detected → investigated → resolved
               - Track resolution times and success rates

        Attributes:
            timestamp_field: Document field containing event timestamp
            sequence_id_field: Field that groups related documents into sequences
            step_key_source: How to extract the step identifier (label/node_id/custom field)
            step_key_field_path: Required if step_key_source='field_path'
            covariates: List of predictor variables to analyze for conversion lift
            max_sequence_duration_days: Filter out sequences longer than this (data quality)

        Example:
            ```python
            # Email thread analysis configuration
            StepAnalyticsConfig(
                timestamp_field="Date",  # Email timestamp
                sequence_id_field="Thread-Index",  # Groups emails in same thread
                step_key_source="assignment_label",  # Use taxonomy label as step
                covariates=[
                    CovariateConfig(
                        field_path="sender_domain",
                        covariate_type="categorical",
                        name="Sender Domain"
                    ),
                    CovariateConfig(
                        field_path="word_count",
                        covariate_type="numeric",
                        name="Email Length"
                    )
                ],
                max_sequence_duration_days=90  # Ignore threads >90 days
            )
            ```
    DurationStats:
      properties:
        mean:
          type: number
          title: Mean
          description: Average duration in seconds
        median:
          type: number
          title: Median
          description: Median duration in seconds
        p50:
          type: number
          title: P50
          description: 50th percentile (same as median)
        p90:
          type: number
          title: P90
          description: 90th percentile duration in seconds
        p95:
          type: number
          title: P95
          description: 95th percentile duration in seconds
        std_dev:
          type: number
          title: Std Dev
          description: Standard deviation in seconds
        min:
          type: number
          title: Min
          description: Minimum duration observed in seconds
        max:
          type: number
          title: Max
          description: Maximum duration observed in seconds
      type: object
      required:
        - mean
        - median
        - p50
        - p90
        - p95
        - std_dev
        - min
        - max
      title: DurationStats
      description: >-
        Statistical distribution of durations for successful step transitions.


        Provides comprehensive percentile analysis to understand timing
        patterns.


        Attributes:
            mean: Average duration (seconds)
            median: Middle value (50th percentile)
            p50: 50th percentile (same as median, included for consistency)
            p90: 90th percentile (90% complete faster)
            p95: 95th percentile (95% complete faster)
            std_dev: Standard deviation (measure of spread)
            min: Fastest observed duration
            max: Slowest observed duration

        Example:
            ```python
            DurationStats(
                mean=432000.0,     # 5 days average
                median=345600.0,   # 4 days median
                p50=345600.0,
                p90=691200.0,      # 8 days (90th percentile)
                p95=864000.0,      # 10 days (95th percentile)
                std_dev=172800.0,  # 2 days std dev
                min=86400.0,       # 1 day minimum
                max=1209600.0      # 14 days maximum
            )
            ```
    PredictorLift:
      properties:
        field:
          type: string
          title: Field
          description: Covariate field name
        value:
          type: string
          title: Value
          description: Specific value or bin label
        count:
          type: integer
          minimum: 0
          title: Count
          description: Number of sequences with this value
        conversion_rate:
          type: number
          maximum: 1
          minimum: 0
          title: Conversion Rate
          description: Conversion rate for this value
        lift:
          type: number
          title: Lift
          description: Lift relative to baseline (>1.0 = positive, <1.0 = negative)
      type: object
      required:
        - field
        - value
        - count
        - conversion_rate
        - lift
      title: PredictorLift
      description: >-
        Lift calculation for a specific covariate value.


        Lift measures how much a specific value increases/decreases conversion
        likelihood

        compared to the baseline. Lift > 1.0 means the value helps conversion.


        Attributes:
            field: Name of the covariate (e.g., "Sender Domain", "Word Count Q3")
            value: Specific value or bin (e.g., "gmail.com", "Q3")
            count: Number of sequences with this value
            conversion_rate: Conversion rate for this value (0.0 to 1.0)
            lift: Conversion rate / baseline rate (1.0 = no effect, >1.0 = positive, <1.0 = negative)

        Example:
            ```python
            # Sender domain "enterprise.com" has 2.5x baseline conversion
            PredictorLift(
                field="Sender Domain",
                value="enterprise.com",
                count=150,
                conversion_rate=0.75,  # 75% conversion
                lift=2.5  # 2.5x the baseline rate
            )
            ```

        Interpretation:
            - lift = 1.5: This value increases conversion by 50%
            - lift = 1.0: No effect on conversion
            - lift = 0.5: This value decreases conversion by 50%
    ErrorDetail:
      properties:
        message:
          type: string
          title: Message
          description: Human-readable error message
        type:
          type: string
          title: Type
          description: Stable error type identifier (machine-readable)
        code:
          anyOf:
            - type: string
            - type: 'null'
          title: Code
          description: >-
            Fine-grained error code for programmatic handling (e.g.,
            namespace_name_taken, feature_extractor_not_found). Present only
            when consumers may need to branch on a specific error condition.
        details:
          anyOf:
            - additionalProperties: true
              type: object
            - type: 'null'
          title: Details
          description: >-
            Optional structured details to help debugging (validation errors,
            IDs, etc.)
      type: object
      required:
        - message
        - type
      title: ErrorDetail
      description: Error detail model.
    ValidationError:
      properties:
        loc:
          items:
            anyOf:
              - type: string
              - type: integer
          type: array
          title: Location
        msg:
          type: string
          title: Message
        type:
          type: string
          title: Error Type
      type: object
      required:
        - loc
        - msg
        - type
      title: ValidationError
    StepKeySource:
      type: string
      enum:
        - assignment_label
        - assignment_node_id
        - field_path
      title: StepKeySource
      description: >-
        Defines how to extract the step key from documents for sequence
        analysis.


        The step key identifies which stage/state a document is in for
        transition analytics.


        Examples:
            ASSIGNMENT_LABEL: Use the taxonomy's assigned label (e.g., "inquiry", "proposal")
            ASSIGNMENT_NODE_ID: Use the taxonomy node ID (e.g., "node_sales_inquiry")
            FIELD_PATH: Use a custom document field (e.g., "metadata.workflow_stage")
    CovariateConfig:
      properties:
        field_path:
          type: string
          title: Field Path
          description: >-
            Dot-notation path to covariate field (e.g., 'sender_domain',
            'metadata.priority')
          examples:
            - sender_domain
            - metadata.word_count
            - features.clip_embedding
        covariate_type:
          $ref: '#/components/schemas/CovariateType'
          description: >-
            Type of covariate determines analysis strategy
            (categorical/numeric/embedding/cluster)
        name:
          type: string
          maxLength: 100
          minLength: 1
          title: Name
          description: Human-readable name for this covariate in analytics results
          examples:
            - Sender Domain
            - Word Count
            - Visual Cluster
        binning_strategy:
          anyOf:
            - type: string
              enum:
                - quartiles
                - deciles
                - custom
            - type: 'null'
          title: Binning Strategy
          description: >-
            How to bin numeric values for lift analysis (only used for NUMERIC
            type)
          default: quartiles
        clustering_method:
          anyOf:
            - type: string
              enum:
                - kmeans
                - hdbscan
            - type: 'null'
          title: Clustering Method
          description: >-
            Clustering algorithm for embedding analysis (only used for EMBEDDING
            type)
          default: kmeans
        n_clusters:
          anyOf:
            - type: integer
              maximum: 100
              minimum: 2
            - type: 'null'
          title: N Clusters
          description: >-
            Number of clusters for embedding-based predictors (only used for
            EMBEDDING type)
          default: 10
      type: object
      required:
        - field_path
        - covariate_type
        - name
      title: CovariateConfig
      description: >-
        Configuration for a single covariate/predictor variable in step
        analytics.


        Covariates are used to identify which features predict conversion from
        one step

        to another. The system computes "lift" for each covariate value, showing
        whether

        it increases or decreases conversion likelihood.


        Attributes:
            field_path: JSONPath to the field in document or metadata (e.g., "sender_domain")
            covariate_type: How to analyze this covariate (categorical, numeric, embedding, cluster)
            name: Human-readable name for analytics results
            binning_strategy: For NUMERIC types, how to bin values (quartiles, deciles)
            clustering_method: For EMBEDDING types, algorithm to use (kmeans, hdbscan)
            n_clusters: For EMBEDDING types, number of clusters to create

        Examples:
            ```python
            # Analyze sender domains (categorical)
            CovariateConfig(
                field_path="sender_domain",
                covariate_type="categorical",
                name="Email Domain"
            )

            # Analyze email length (numeric with quartile binning)
            CovariateConfig(
                field_path="word_count",
                covariate_type="numeric",
                name="Word Count",
                binning_strategy="quartiles"
            )

            # Analyze visual similarity (embedding clustering)
            CovariateConfig(
                field_path="features.clip_embedding",
                covariate_type="embedding",
                name="Visual Cluster",
                clustering_method="kmeans",
                n_clusters=10
            )
            ```
    CovariateType:
      type: string
      enum:
        - categorical
        - numeric
        - embedding
        - cluster_id
      title: CovariateType
      description: >-
        Type of covariate/predictor variable for conversion analysis.


        Different types enable different analysis strategies:

        - CATEGORICAL: String values, analyzed via grouping (e.g.,
        sender_domain, priority)

        - NUMERIC: Continuous values, binned into quartiles/deciles (e.g.,
        word_count, price)

        - EMBEDDING: Dense vectors, clustered for semantic analysis (e.g., CLIP
        embeddings)

        - CLUSTER_ID: Pre-computed cluster identifiers (e.g., topic_cluster,
        visual_cluster)


        Examples:
            ```python
            # Categorical: Which email domains convert better?
            CovariateConfig(field_path="sender_domain", covariate_type="categorical")

            # Numeric: Do longer emails convert faster?
            CovariateConfig(field_path="word_count", covariate_type="numeric")

            # Embedding: Do visually similar images follow similar paths?
            CovariateConfig(field_path="features.clip", covariate_type="embedding")

            # Cluster: Which topic clusters have highest conversion?
            CovariateConfig(field_path="metadata.topic_id", covariate_type="cluster_id")
            ```

````