Has anyone found a reliable pattern to prevent “silent failures” in Power Automate?

(0) Share

Report

Posted on by Sunil Kumar Pashikanti

2,111 Moderator

One challenge I’ve repeatedly seen in larger or enterprise flows is situations where the overall flow reports “Succeeded”, but one or more internal actions fail, are skipped unexpectedly, or never surface clearly in the final status.

This becomes harder to diagnose when dealing with:

Nested or child flows
Multiple connectors and integrations
Conditional execution paths
Long-running automation scenarios

Some common issues I’ve experienced:

Failure states not propagating to the final flow outcome
Difficulty identifying the actual root cause from run history
Limited centralized visibility across executions

To improve observability, I’ve been experimenting with a structured pattern using:

TRY / CATCH / FINALLY scopes
Controlled failure propagation
Centralized telemetry logging

I recently put together a reusable implementation based on this approach:

https://github.com/spashikanti/flowarmor-core

Would be interested to hear how others are approaching this in production environments.

Do you rely on built-in monitoring features?
Do you use custom logging or telemetry?
How do you surface failures across child flows and production environments?
Any patterns that have worked well at scale?

Categories:

General topics

Building flows

Using flows

I have the same question (0)

All responses (2)

Answers (0)

Sort by

Suggested answer

Assisted by AI

sannavajjala87 282 Super User 2026 Season 1 on at

Like (0)

Report
Copy link

Link copied!
Hi,

Nice work on FlowArmor. You've diagnosed it exactly right: the silent green is almost always a CATCH scope absorbing the failure with nothing re-throwing, so the run ends Succeeded. The thing you've baked in that most people miss, making the parent the source of truth and explicitly terminating Failed off a varHasFailure flag, is the real fix. Here's how I've handled the same problems in production.
Built-in monitoring: useful for triage but you outgrow it fast. Run history is fine for a single run but retention is short and there's no cross-flow view, and Analytics and the CoE Kit give you macro failure rates, not transaction-level root cause. So I rely on custom telemetry for the rest, which is the gap your framework fills.

Custom telemetry: yes. The biggest lever at scale is environment-level Application Insights. It captures action-level telemetry across every flow without instrumenting each one, with correlation, KQL, and alerting built in. I pair it with a normalized in-flow log (basically what you're doing) for the business events App Insights won't see, joined on the correlation ID. That's essentially your Tier 3, and it's why framing

FlowArmor as the generator that feeds App Insights rather than a replacement is the right call.

Surfacing failures across child flows: same as yours, child always returns a structured response, parent inspects it and sets the final status. A child failing without the parent checking is the other big source of false greens, and one correlation ID through the chain is what makes a nested run reconstructable later.
Scale patterns that held up for me:

Standardize the scaffold in a template or solution every flow inherits, so error handling isn't reinvented per flow.

Decouple logging asynchronously (as you do) and alert from the central store with thresholds, not per-run emails. The alert-storm-during-an-outage problem you mention is very real.

Add retry policies and idempotency on connector-heavy and long-running flows, so transient blips don't become false negatives or duplicate side effects.

One friendly note: the SharePoint sink is fine for Tier 1, but it's the piece I'd watch as volume climbs (list throttling and the 5,000-item limits hit sooner than expected), so your roadmap to Dataverse then App Insights is the right move. And the planned "multiple logs sharing one CorrelationId" step is what turns this from "which run failed" into "which action in which child failed and why."
Good contribution. Curious where v2 and the KQL cookbook land.

Was this reply helpful? Yes No
Sunil Kumar Pashikanti 2,111 Moderator on at

Like (0)

Report
Copy link

Link copied!

Hi @sannavajjala87,

Thanks for the thoughtful feedback. I really appreciate you taking the time to break this down from a production perspective.

You captured the intent behind FlowArmor perfectly. It is not trying to replace App Insights at all; it is meant to complement it. While App Insights gives strong platform-level telemetry, the specific business context is usually the missing piece. Bringing that in through structured in-flow logs and tying it all together with a shared Correlation ID is where things start to become highly usable at scale.

I also agree with your warning about SharePoint, though I think it sometimes gets written off a bit too quickly. In practice, many enterprise environments still rely heavily on it due to licensing considerations, existing system-of-record alignment, or simply because teams are deeply invested in it operationally.

When it is treated strictly as an operational point-lookup store rather than an analytics layer, and paired with smart indexing and folder-based partitioning, it can actually hold up reasonably well for low to medium-scale workloads. That said, you are absolutely right that once you move into higher concurrency and deeper observability, App Insights and Dataverse are the definitive long-term directions.

The alert-storm callout resonated as well. Decoupled logging with centralized, threshold-based alerting is really the only model that scales cleanly when things go sideways.

For the next iteration, I am focusing on tightening parent-child flow interaction using that single Correlation ID, and making it easier to plug directly into App Insights for deeper querying. I actually just put together a deeper dive on how far SharePoint can be pushed when designed around these specific threshold behaviors and Power Platform constraints if you want to check it out:

Rethinking SharePoint at Scale: Designing for Millions Beyond the 5,000 Item Threshold

Appreciate the great insights again. This kind of real-world perspective is exactly what helps refine the framework!

Was this reply helpful? Yes No