Results
Once an experiment is running or completed, the Result Details view provides deep insights into performance and output quality.
1. Scenarios
A granular table of every individual run in the experiment.
Search & Filter: Search by output content or filter by specific variable values.
Status Indicators: Identifies successful runs versus "Invalid Responses" (parsing failures).
Run Details: Click any row to open the Run Details Drawer, showing:
Raw Input/Output.
Evaluation results.
Retry Failed: Bulk retry only the scenarios that failed or returned invalid data.
Export: Download the entire results table as a CSV.
2. Performance Report
High-level analytics and visualizations.
Key Metrics:
Throughput: Current progress vs. total planned scenarios.
Pass Rate: Percentage of scenarios that met all evaluation criteria.
Avg. Latency: Average response time per run.
Actual vs. Estimated Cost: Compare the real spend against the initial projection.
Visualizations:
Score Distribution: Histogram showing how many results fall into various score brackets.
Success Rate: Pie chart of Passed vs. Failed scenarios.
Latency & Duration: Line chart tracking response times over the course of the experiment.
Token Consumption: Bar charts showing Input vs. Output token usage relative to estimates.
Model Filtering: Use the model selector to view analytics for single models or cumulative "All Models" data.
3. Live Execution Controls
Visible only while an experiment is active.
Settings Dialog: Adjust concurrency and RPM limits in real-time without stopping the run.
Stop Button: Terminate the current execution (partial results are saved).
Pause/Resume: Stop current runs and resume them later from where they left off.