Experiments

Evaluations

Before entering the configuration view, you must choose a high-level Experiment Type. This choice determines how the AI's output is analyzed:

Custom: Manual setup with or without programmatic Criteria.
- Evaluation Rules: Create programmatic checks like Regex, JSON Schema, or Lexical Overlap to automatically grade responses.
Selector: AI chooses the best option from a list of variables.
Rater: AI provides qualitative scores or side-by-side comparisons.

The core logic of your experiment.

System Prompt: Defines the persona, constraints, and operational context for the AI.
User Prompt: The specific query or instruction.
Variable Injection: Surround keys with double curly braces (e.g., {{variable_name}}) to dynamically inject data during execution.
Live Preview: a real-time rendering of your prompt with interpolated values from your selected data sources.

Manage how data is fed into your prompt.

Detection: The system automatically detects {{tags}} from your prompts.
Data Sources:
- Datasets: Map variables to CSV or JSON data rows.
- Personas: Inject full profiles or specific sections of a Persona.
- Businesses: Inject Branding, Audience, or Core Profile data from your saved Businesses.
- Prompt Templates: Use an existing prompt as a variable value.
If the selected prompt is a Template with its own variables, those variables are automatically "nested" and will appear in the Experiment's variable list for configuration. Only works if a single prompt template is selected.

Configuration:
- Full Profile: Toggle to include the entire object as JSON or Markdown.
- Included Sections: Select specific attributes (e.g., only "Brand Voice" or "Target Audience Segment").
- Exclusions: Skip specific items within a dataset.

Choose the AI models that will process your prompts.

Provider Selection: Enable/disable specific providers (OpenAI, Anthropic, Ollama, etc.).
Model Checkboxes: Select one or more models to run in parallel.
Execution Config:
- Parallel Execution: Enable to run multiple scenarios simultaneously.
- Concurrency: Set the maximum number of simultaneous requests.
- Rate Limits (RPM): Define requests-per-minute limits to avoid API throttling.

A pre-flight checklist and cost projection.

Token Projections: Estimated input/output tokens based on your variable mapping.
Cost Estimate: Projected monetary cost for the entire experiment.
Scenario Breakdown: Breakdown of how many total runs will occur (Variables × Models).
Run Button: Triggers the execution and takes you to the Results page.