Datasets

Datasets are the primary source of variable data for your experiments in PrxmptStudix. They allow you to define collections of inputs (rows) that can be injected into system prompts to test model performance across a wide range of scenarios.

Overview

A Dataset consists of:

Name: A descriptive title for the collection.
Description: Optional context or metadata about the source of the data.
Rows: A linear list of text strings. Each row represents a single unique input for an experiment.
Organization: Datasets can be nested in folders and manually ordered.

Managing Rows

The Dataset editor provides a powerful interface for managing high-volume text data:

Row-Based Editing

Fluid Entry: Simply type in a row to edit it. A new empty row is automatically added at the bottom as you type.
Keyboard Navigation: Pressing Ctrl + Enter while editing a row will automatically move the focus to the next row, allowing for rapid-fire data entry.
Autosizing: Text areas expand vertically to accommodate multi-line content without losing context.

Reordering & Deletion

Drag-and-Drop: Use the Grip Handle (⠿) on the left of any row to drag it to a new position.
Deletion: Hover over a row to reveal the Trash Icon.
Undo Support: Accidentally deleted a row? Use Ctrl + Z or the Undo button in the toast notification to restore it immediately.

Duplicate Detection

PrxmptStudix automatically scans your dataset for duplicate rows:

Visual Callouts: Duplicate rows are bolded and highlighted with unique semantic colors to help you identify matching groups at a glance.
Multiplicity Labels: Each duplicate row displays a count (e.g., x3) indicating how many times that specific string appears in the dataset.
Summary Header: The editor header shows the total number of duplicate items found across the entire dataset.

Use Duplicate Detection to ensure your experiment results aren't skewed by redundant test cases. A clean dataset leads to more reliable model evaluations.

AI-Assisted Workflows

Leverage AI to create or augment your data:

AI Generation: Create a brand new dataset from a seed prompt or description.
AI Autofill: Use the magic wand icon in an open dataset and ask the AI to generate more rows for an existing dataset based on its current content and description. This is perfect for expanding edge cases or increasing sample sizes for testing.