Experiments

Evaluations

Before entering the configuration view, you must choose a high-level Experiment Type. This choice determines how the AI's output is analyzed:

  • Custom: Manual setup with or without programmatic Criteria.

    • Evaluation Rules: Create programmatic checks like Regex, JSON Schema, or Lexical Overlap to automatically grade responses.

  • Selector: AI chooses the best option from a list of variables.

  • Rater: AI provides qualitative scores or side-by-side comparisons.


1. Prompts

The core logic of your experiment.

  • System Prompt: Defines the persona, constraints, and operational context for the AI.

  • User Prompt: The specific query or instruction.

  • Variable Injection: Surround keys with double curly braces (e.g., {{variable_name}}) to dynamically inject data during execution.

  • Live Preview: a real-time rendering of your prompt with interpolated values from your selected data sources.


2. Variables

Manage how data is fed into your prompt.

  • Detection: The system automatically detects {{tags}} from your prompts.

  • Data Sources:

    • Datasets: Map variables to CSV or JSON data rows.

    • Personas: Inject full profiles or specific sections of a Persona.

    • Businesses: Inject Branding, Audience, or Core Profile data from your saved Businesses.

    • Prompt Templates: Use an existing prompt as a variable value.

    If the selected prompt is a Template with its own variables, those variables are automatically "nested" and will appear in the Experiment's variable list for configuration. Only works if a single prompt template is selected.

  • Configuration:

    • Full Profile: Toggle to include the entire object as JSON or Markdown.

    • Included Sections: Select specific attributes (e.g., only "Brand Voice" or "Target Audience Segment").

    • Exclusions: Skip specific items within a dataset.


3. Models

Choose the AI models that will process your prompts.

  • Provider Selection: Enable/disable specific providers (OpenAI, Anthropic, Ollama, etc.).

  • Model Checkboxes: Select one or more models to run in parallel.

  • Execution Config:

    • Parallel Execution: Enable to run multiple scenarios simultaneously.

    • Concurrency: Set the maximum number of simultaneous requests.

    • Rate Limits (RPM): Define requests-per-minute limits to avoid API throttling.


4. Summary

A pre-flight checklist and cost projection.

  • Token Projections: Estimated input/output tokens based on your variable mapping.

  • Cost Estimate: Projected monetary cost for the entire experiment.

  • Scenario Breakdown: Breakdown of how many total runs will occur (Variables × Models).

  • Run Button: Triggers the execution and takes you to the Results page.


Was this article helpful?
© 2024-2026 | All Rights Reserved. Hikoky