Results

Once an experiment is running or completed, the Result Details view provides deep insights into performance and output quality.

1. Scenarios

A granular table of every individual run in the experiment.

Search & Filter: Search by output content or filter by specific variable values.
Status Indicators: Identifies successful runs versus "Invalid Responses" (parsing failures).
Run Details: Click any row to open the Run Details Drawer, showing:
- Raw Input/Output.
- Evaluation results.
Retry Failed: Bulk retry only the scenarios that failed or returned invalid data.
Export: Download the entire results table as a CSV.

High-level analytics and visualizations.

Key Metrics:
- Throughput: Current progress vs. total planned scenarios.
- Pass Rate: Percentage of scenarios that met all evaluation criteria.
- Avg. Latency: Average response time per run.
- Actual vs. Estimated Cost: Compare the real spend against the initial projection.
Visualizations:
- Score Distribution: Histogram showing how many results fall into various score brackets.
- Success Rate: Pie chart of Passed vs. Failed scenarios.
- Latency & Duration: Line chart tracking response times over the course of the experiment.
- Token Consumption: Bar charts showing Input vs. Output token usage relative to estimates.
Model Filtering: Use the model selector to view analytics for single models or cumulative "All Models" data.

Visible only while an experiment is active.

Settings Dialog: Adjust concurrency and RPM limits in real-time without stopping the run.
Stop Button: Terminate the current execution (partial results are saved).
Pause/Resume: Stop current runs and resume them later from where they left off.