Create benchmark
Create a new benchmark run to compare retriever pipelines. The benchmark will replay historical sessions and measure alignment with observed user behavior.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Request to create a new benchmark run.
Human-readable name for this benchmark.
1 - 255ID of the baseline retriever pipeline to compare against.
IDs of candidate retriever pipelines to evaluate.
1Optional filter criteria for selecting sessions to replay.
Number of sessions to include in the benchmark.
10 <= x <= 10000Response
Successful Response
Response containing benchmark details and results.
Unique benchmark identifier.
Human-readable name.
Baseline retriever ID.
Candidate retriever IDs.
Number of sessions in benchmark.
Current benchmark status.
pending, building_sessions, replaying, computing_metrics, completed, failed Creation timestamp.
Filter criteria used.
Results per pipeline (available when completed).
Statistical comparison (available when completed).
Execution start time.
Completion time.
Error message if failed.

