Experiments
Evaluations
Before entering the configuration view, you must choose a high-level Experiment Type. This choice determines how the AI's output is analyzed:
Custom: Manual setup with or without programmatic Criteria.
Evaluation Rules: Create programmatic checks like Regex, JSON Schema, or Lexical Overlap to automatically grade responses.
Selector: AI chooses the best option from a list of variables.
Rater: AI provides qualitative scores or side-by-side comparisons.
1. Prompts
The core logic of your experiment.
System Prompt: Defines the persona, constraints, and operational context for the AI.
User Prompt: The specific query or instruction.
Variable Injection: Surround keys with double curly braces (e.g.,
{{variable_name}}) to dynamically inject data during execution.Live Preview: a real-time rendering of your prompt with interpolated values from your selected data sources.
2. Variables
Manage how data is fed into your prompt.
Detection: The system automatically detects
{{tags}}from your prompts.Data Sources:
Datasets: Map variables to CSV or JSON data rows.
Personas: Inject full profiles or specific sections of a Persona.
Businesses: Inject Branding, Audience, or Core Profile data from your saved Businesses.
Prompt Templates: Use an existing prompt as a variable value.
If the selected prompt is a Template with its own variables, those variables are automatically "nested" and will appear in the Experiment's variable list for configuration. Only works if a single prompt template is selected.
Configuration:
Full Profile: Toggle to include the entire object as JSON or Markdown.
Included Sections: Select specific attributes (e.g., only "Brand Voice" or "Target Audience Segment").
Exclusions: Skip specific items within a dataset.
3. Models
Choose the AI models that will process your prompts.
Provider Selection: Enable/disable specific providers (OpenAI, Anthropic, Ollama, etc.).
Model Checkboxes: Select one or more models to run in parallel.
Execution Config:
Parallel Execution: Enable to run multiple scenarios simultaneously.
Concurrency: Set the maximum number of simultaneous requests.
Rate Limits (RPM): Define requests-per-minute limits to avoid API throttling.
4. Summary
A pre-flight checklist and cost projection.
Token Projections: Estimated input/output tokens based on your variable mapping.
Cost Estimate: Projected monetary cost for the entire experiment.
Scenario Breakdown: Breakdown of how many total runs will occur (Variables × Models).
Run Button: Triggers the execution and takes you to the Results page.