Evaluation: Programmatic
Programmatic evaluations are the bedrock of automated testing in PrxmptStudix. They allow you to define objective, code-based criteria that determine if an AI's response "passes" or "fails."
Programmatic rules are configured in the Evaluations tab of a Custom Experiment. You can chain multiple rules together; an experiment scenario is typically considered "Passed" only if all active rules are satisfied.
Metric Types
Exact Match: The simplest evaluation. It checks if the AI's response contains a specific string.
Config:
Expected String: The text to look for.
Case Sensitive: (Toggle) If enabled, "Hello" will NOT match "hello".
Best For: Fixed-choice outputs (e.g., "YES", "NO") or specific formatting tokens.
Regular Expression (Regex): Powerful pattern matching for structured text.
Config: Pattern: A standard Regex pattern (e.g., \d{3}-\d{2}-\d{4} for SSNs).
Best For: Validating formats like phone numbers, email addresses, dates, or ensuring a specific Markdown structure.
JSON Schema Validation: Ensures the AI output is valid JSON and conforms to a strict structure.
Config:
Schema: A JSON object defining required keys and types.
Validation Logic: The system automatically extracts JSON from Markdown code blocks before validating. It checks for the existence of all keys defined in your schema.
Best For: Programmatic integrations where the AI output must be parsed by other software.
Length Count: Checks the verbosity of the AI.
Config:
Min/Max: Set the allowed range.
Mode: Measure by Characters or Words.
Best For: Ensuring summaries are concise or that long-form content meets a minimum requirement.
Lexical Overlap: Checks for the presence of specific keywords.
Config:
Keywords: A comma-separated list of words.
Must Contain All: (Toggle) If enabled, every keyword must be present. If disabled, just one is enough.
Best For: Brand monitoring (ensuring specific brand names are used) or safety checks (blocking restricted terms
Status and Management
Active Toggle: You can temporarily disable a rule without deleting it. This is useful for debugging which specific criteria are causing failures.
Deduplication: PrxmptStudix's execution engine ensures that rules are run efficiently across all model/variable combinations.