Evaluation: Selector
The Selector evaluation is a specialized tool for forced-choice testing. It presents the AI with a list of options and requires it to choose the one that best satisfies the prompt's criteria.
How it Works
Source Binding: You select a Dataset or Folder (Persona/Business) as the source of options.
Permutation: The system takes
Nitems from that source (e.g., 3 options) and injects them into the User Prompt.Choice Extraction: The AI is instructed (via an automatic system suffix) to return its choice in a specific format (JSON or a clear label like "Option A").
Leaderboard Tracking: The system tracks which specific items from your source are selected most often across all scenarios and models.
The Permutation Algorithm: The Selector experiment doesn't just shuffle items; it uses a systematic multi-pass permutation algorithm to ensure rigorous testing.
Selection (N P K): The system takes your source items n) and generates every possible subset of size k (the "Options per Run").
Order Sensitivity: Unlike a "Combination," the Selector uses Ordered Permutations. This means the set [Option A, Option B] and [Option B, Option A] are treated as two distinct scenarios.
Position Bias Detection: By presenting the same options in different positions across multiple runs, the system can detect if a model has a "Position Bias" (e.g., a tendency to always pick the first option regardless of content).
Combinatorial Growth: Note that as n and k increase, the number of runs grows rapidly following the formula:
P(n, k) = n! / (n - k)!Example: 5 items with 3 options per run results in 60 unique scenarios per model.
Configuration
Options per Run
Range: 2 to 10.
Function: Controls how many items from your source are shown to the AI in a single request.
Tip: High counts (e.g., 10) test the AI's ability to handle long context, while low counts (e.g., 2) are better for simple A/B testing.
Listing Style
Letters (A, B, C): Conventional choice labels.
Best For: Numeric data, currency, or dates. Using "Option 1: 500" can confuse the AI's internal tokenization. "Option A: 500" provides a clear semantic boundary between the Label and the Value.
Numbers (1, 2, 3): Useful for very long lists.
Best For: Long text descriptions, names, or abstract concepts. Numbers provide a familiar hierarchical structure for categorical data that doesn't contain digits.
Auto-Formatting: The system automatically generates the markdown list for you, so you don't need to manually map variables like {{option_a}}.
Data Configuration (Folders)
If your source is a Folder (e.g., a Persona group):
Full Profile: Injects the entire record for each option.
Section Selection: Injects only specific fields (e.g., just the "Bio" and "Pain Points"), keeping the prompt focused and saving tokens.
Extraction Logic
The system uses a multi-stage approach to find the AI's choice:
JSON Search: Looks for a JSON object with an
optionorchoicekey.Label Match: Scans for patterns like "Option A", "Choice: 1", or simply "A".
Direct Match: If the AI output is just a single character/digit, it is matched directly to the option index.
Use Cases
📈 Marketing & Business
Headline A/B Testing: Present 5 variations of a landing page headline to see which one a "Skeptic" persona finds most trustworthy.
Product Recommendation: Give the AI a customer's purchase history and ask it to pick the best next product from a catalog.
Brand Voice Consistency: Show 4 variations of a tweet and ask which one best adheres to the specific brand voice defined in a Business profile.
Pricing Strategy: Present various price points and ask a "Frugal Persona" which one feels most like a "steal" vs "fair value."
Target Audience Segmenting: Provide 3 audience profiles and ask which one would most likely respond to a specific discount offer.
🛡️ Content & Quality
Content Moderation: Present 3 user comments and ask the AI to select the one that violates a specific safety policy.
Sentiment Labeling: Give 5 reviews and ask the AI to pick the most "Positive" or "Urgent" one.
Summary Quality: Paste a long article and provide 3 AI-generated summaries; ask a "Professional Editor" persona to pick the most accurate one.
Legal Document Classification: Present 4 contract clauses and ask the AI to categorize them into "High Risk" vs "Low Risk."
Code Review Assistant: Show 3 ways to fix a bug and ask the AI to pick the most efficient/performant solution.
🎓 Specialized Domains
Medical Triage Simulation: Present 3 patient descriptions and ask the AI to prioritize the most critical case for a Roleplay scenario.
Educational Level Assessment: Provide 3 explanations of Quantum Physics and ask which one is most appropriate for a "Middle School Student" level.
Conflict Resolution: Present 4 potential responses to a workplace conflict and ask which one is most likely to de-escalate the situation.
Financial Risk Assessment: Provide 3 loan applications and ask the AI to identify the one with the highest potential for default based on specific heuristics.
Creative Writing Plot Choice: Provide 3 narrative "hooks" and ask a "Fantasy Fan" persona which one they would be most excited to read further.