Results

Once an experiment is running or completed, the Result Details view provides deep insights into performance and output quality.

1. Scenarios

A granular table of every individual run in the experiment.

  • Search & Filter: Search by output content or filter by specific variable values.

  • Status Indicators: Identifies successful runs versus "Invalid Responses" (parsing failures).

  • Run Details: Click any row to open the Run Details Drawer, showing:

    • Raw Input/Output.

    • Evaluation results.

  • Retry Failed: Bulk retry only the scenarios that failed or returned invalid data.

  • Export: Download the entire results table as a CSV.


2. Performance Report

High-level analytics and visualizations.

  • Key Metrics:

    • Throughput: Current progress vs. total planned scenarios.

    • Pass Rate: Percentage of scenarios that met all evaluation criteria.

    • Avg. Latency: Average response time per run.

    • Actual vs. Estimated Cost: Compare the real spend against the initial projection.

  • Visualizations:

    • Score Distribution: Histogram showing how many results fall into various score brackets.

    • Success Rate: Pie chart of Passed vs. Failed scenarios.

    • Latency & Duration: Line chart tracking response times over the course of the experiment.

    • Token Consumption: Bar charts showing Input vs. Output token usage relative to estimates.

  • Model Filtering: Use the model selector to view analytics for single models or cumulative "All Models" data.


3. Live Execution Controls

Visible only while an experiment is active.

  • Settings Dialog: Adjust concurrency and RPM limits in real-time without stopping the run.

  • Stop Button: Terminate the current execution (partial results are saved).

  • Pause/Resume: Stop current runs and resume them later from where they left off.


Was this article helpful?
© 2024-2026 | All Rights Reserved. Hikoky