Evaluation: Programmatic

Programmatic evaluations are the bedrock of automated testing in PrxmptStudix. They allow you to define objective, code-based criteria that determine if an AI's response "passes" or "fails."

Programmatic rules are configured in the Evaluations tab of a Custom Experiment. You can chain multiple rules together; an experiment scenario is typically considered "Passed" only if all active rules are satisfied.


Metric Types

Exact Match: The simplest evaluation. It checks if the AI's response contains a specific string.

Config:

  • Expected String: The text to look for.

  • Case Sensitive: (Toggle) If enabled, "Hello" will NOT match "hello".

Best For: Fixed-choice outputs (e.g., "YES", "NO") or specific formatting tokens.

Regular Expression (Regex): Powerful pattern matching for structured text.

Config: Pattern: A standard Regex pattern (e.g., \d{3}-\d{2}-\d{4} for SSNs).

Best For: Validating formats like phone numbers, email addresses, dates, or ensuring a specific Markdown structure.

JSON Schema Validation: Ensures the AI output is valid JSON and conforms to a strict structure.

Config:

  • Schema: A JSON object defining required keys and types.

  • Validation Logic: The system automatically extracts JSON from Markdown code blocks before validating. It checks for the existence of all keys defined in your schema.

Best For: Programmatic integrations where the AI output must be parsed by other software.

Length Count: Checks the verbosity of the AI.

Config:

  • Min/Max: Set the allowed range.

  • Mode: Measure by Characters or Words.

Best For: Ensuring summaries are concise or that long-form content meets a minimum requirement.

Lexical Overlap: Checks for the presence of specific keywords.

Config:

  • Keywords: A comma-separated list of words.

  • Must Contain All: (Toggle) If enabled, every keyword must be present. If disabled, just one is enough.

Best For: Brand monitoring (ensuring specific brand names are used) or safety checks (blocking restricted terms


Status and Management

  • Active Toggle: You can temporarily disable a rule without deleting it. This is useful for debugging which specific criteria are causing failures.

  • Deduplication: PrxmptStudix's execution engine ensures that rules are run efficiently across all model/variable combinations.


Was this article helpful?
© 2024-2026 | All Rights Reserved. Hikoky