Copilot Studio Evaluations Failing with "Error" on all test cases

(0) Share

Report

Posted on by CM-19061902-0

Running into an issue with Copilot Studio Evaluations, trying to determine if this is a service issue, this is happening for multiple clients environments. anyone else experiencing issues with Copilot Studio Evaluations?

Tested:

Large and small evaluation datasets
A single "hello" -> "hello" test case
Multiple agents
An empty agent with no knowledge sources, topics, or actions

The agents work correctly in Test Chat, but Evaluations always return Error for both General Quality and Compare Meaning.

Seeing this same behavior in multiple client environments/tenants, which makes me wonder if this is a broader Copilot Studio issue?

Best,

Carl Mesias

Categories:

Calling actions from Copilot Studio

I have the same question (0)

All responses (2)

Answers (0)

Sort by

Suggested answer

Sunil Kumar Pashikanti 2,211 Moderator on at

Like (0)

Report
Copy link

Link copied!
Hi @CM-19061902-0,

I tested this scenario in my tenant to validate whether this is a broader issue.

I ran evaluations on an unpublished agent using a very simple test case (“hello → hello”) and tried both:

Single response

Conversation (preview)

Both evaluations completed successfully and returned 100% pass.

Based on this, evaluations are working correctly in at least some environments, which suggests this is not a global platform outage.

Given your results:

Failing even on a simple test case

Reproducing across multiple agents and datasets

Test Chat working fine

This points more towards an environment or tenant-specific issue, rather than the agent configuration itself.

A few things worth checking:

Try in a different region/environment (if available)

Have another user in the same tenant run the same test

Compare across tenants where you’re seeing the issue

Check if there are any service health advisories during the timeframe

If the issue is consistent in specific tenants but not others, it’s likely something backend but scoped (not global), and a support ticket with Session ID and timestamps would help Microsoft narrow it down.

In my case, the same minimal test worked fine, so it may help you isolate whether this is tenant-specific vs broader.

Single Response:

Conversation:

✅ If one of the responses here solved your issue, please mark it as Accepted so others facing the same problem can benefit as well.
👍 If this or any other reply here helped you, feel free to give it a Like. It helps others and is always appreciated.

Sunil Kumar Pashikanti, Moderator
Blog: https://sunilpashikanti.com/posts/

Was this reply helpful? Yes No
Suggested answer

11manish 3,052 on at

Like (0)

Report
Copy link

Link copied!
Based on your testing, this looks more like a Copilot Studio Evaluations service issue than an agent configuration problem.

Review Microsoft Service Health for Copilot Studio/AI services.

Capture:

Environment IDs

Evaluation run IDs

Correlation IDs (if available)

Timestamps of failures

Open a Microsoft Support ticket if the issue persists, as you'll have a strong reproducible case demonstrating that the problem occurs across multiple tenants and with minimal test data.

Was this reply helpful? Yes No