# Advanced Testing Using Evaluations

{% embed url="<https://youtu.be/Htoi0PBYHek>" %}

The Evaluations feature in MindStudio allows you to test AI workflows at scale using autogenerated or manually defined test cases. This is especially helpful for validating workflows like moderation filters, where consistent logic must be applied across many inputs.

### Why Use Evaluations?

Manually testing workflows via the preview debugger becomes inefficient as the number of test cases grows. Evaluations allow you to:

* Autogenerate test cases with AI
* Specify expected outputs
* Run tests in bulk
* Compare actual vs. expected results
* Use fuzzy matching for flexible validation

### Sample Use Case: Spam Detection

In this example, an AI workflow is designed to detect spam comments and flag violations based on defined community guidelines. The workflow takes in a comment via a launch variable and outputs:

* A boolean indicating whether it's spam
* An array of flags indicating types of violations

### Creating and Running Test Cases

#### Step 1: Access the Evaluations Tab

* Navigate to the top-level "Evaluations" tab in your project.
* Click **New Test Case** to manually add a test or use **Autogenerate** to let AI create test cases for you.

#### Step 2: Autogenerate Violating Test Cases

* Input guidance like “generate five test cases that are in violation of our guidelines.”
* AI will produce sample comments with the correct input structure.
* Add expected results (e.g., `"is_spam": true`, `"flags": ["hateful", "off-topic"]`).

#### Step 3: Run Test Cases

* Click **Run All** to test all cases in parallel.
* MindStudio will show which tests pass or fail based on comparison with expected results.
* Each test can be inspected in the debugger.

#### Step 4: Autogenerate Non-Violating Test Cases

* Repeat the process with a new prompt: “generate five comments not in violation.”
* Provide expected results (e.g., `"is_spam": false`, `"flags": []`).
* Run the new set and verify accuracy.

### Matching Methods

MindStudio supports two types of result matching:

* **Literal Match**: Requires the actual output to exactly match the expected value.
* **Fuzzy Match**: Allows minor differences or variations in phrasing. Useful for outputs with dynamic AI wording.

### Benefits of Evaluations

* Run many test cases at once
* Easily edit and rerun failing cases
* Debug individual results
* Improve the reliability of your AI workflows

***

Evaluations are a key tool for ensuring your AI behaves as expected at scale. Whether you're building content filters, classifiers, or other deterministic logic, this feature helps you confidently validate your workflows.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://university.mindstudio.ai/2-workflow-mastery/advanced-testing-using-evaluations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
