Evaluation: Programmatic

Programmatic evaluations are the bedrock of automated testing in PrxmptStudix. They allow you to define objective, code-based criteria that determine if an AI's response "passes" or "fails."

Programmatic rules are configured in the Evaluations tab of a Custom Experiment. You can chain multiple rules together; an experiment scenario is typically considered "Passed" only if all active rules are satisfied.

Metric Types

Exact Match: The simplest evaluation. It checks if the AI's response contains a specific string.

Config:

Expected String: The text to look for.
Case Sensitive: (Toggle) If enabled, "Hello" will NOT match "hello".

Best For: Fixed-choice outputs (e.g., "YES", "NO") or specific formatting tokens.

Regular Expression (Regex): Powerful pattern matching for structured text.

Config: Pattern: A standard Regex pattern (e.g., \d{3}-\d{2}-\d{4} for SSNs).

Best For: Validating formats like phone numbers, email addresses, dates, or ensuring a specific Markdown structure.

JSON Schema Validation: Ensures the AI output is valid JSON and conforms to a strict structure.

Config:

Schema: A JSON object defining required keys and types.
Validation Logic: The system automatically extracts JSON from Markdown code blocks before validating. It checks for the existence of all keys defined in your schema.

Best For: Programmatic integrations where the AI output must be parsed by other software.

Length Count: Checks the verbosity of the AI.

Config:

Min/Max: Set the allowed range.
Mode: Measure by Characters or Words.

Best For: Ensuring summaries are concise or that long-form content meets a minimum requirement.

Lexical Overlap: Checks for the presence of specific keywords.

Config:

Keywords: A comma-separated list of words.
Must Contain All: (Toggle) If enabled, every keyword must be present. If disabled, just one is enough.

Best For: Brand monitoring (ensuring specific brand names are used) or safety checks (blocking restricted terms

Status and Management

Active Toggle: You can temporarily disable a rule without deleting it. This is useful for debugging which specific criteria are causing failures.
Deduplication: PrxmptStudix's execution engine ensures that rules are run efficiently across all model/variable combinations.

Evaluation: Programmatic

Metric Types

Status and Management

Related Articles