web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Has anyone found a rel...
Power Automate
Suggested Answer

Has anyone found a reliable pattern to prevent “silent failures” in Power Automate?

(0) ShareShare
ReportReport
Posted on by 2,111 Moderator
One challenge I’ve repeatedly seen in larger or enterprise flows is situations where the overall flow reports “Succeeded”, but one or more internal actions fail, are skipped unexpectedly, or never surface clearly in the final status.
 
This becomes harder to diagnose when dealing with:
  • Nested or child flows
  • Multiple connectors and integrations
  • Conditional execution paths
  • Long-running automation scenarios
Some common issues I’ve experienced:
  • Failure states not propagating to the final flow outcome
  • Difficulty identifying the actual root cause from run history
  • Limited centralized visibility across executions
To improve observability, I’ve been experimenting with a structured pattern using:
  • TRY / CATCH / FINALLY scopes
  • Controlled failure propagation
  • Centralized telemetry logging
I recently put together a reusable implementation based on this approach:
 
Would be interested to hear how others are approaching this in production environments.
  • Do you rely on built-in monitoring features?
  • Do you use custom logging or telemetry?
  • How do you surface failures across child flows and production environments?
  • Any patterns that have worked well at scale?
I have the same question (0)
  • Suggested answer
    Assisted by AI
    sannavajjala87 Profile Picture
    282 Super User 2026 Season 1 on at
    Hi,
     
    Nice work on FlowArmor. You've diagnosed it exactly right: the silent green is almost always a CATCH scope absorbing the failure with nothing re-throwing, so the run ends Succeeded. The thing you've baked in that most people miss, making the parent the source of truth and explicitly terminating Failed off a varHasFailure flag, is the real fix. Here's how I've handled the same problems in production.
    Built-in monitoring: useful for triage but you outgrow it fast. Run history is fine for a single run but retention is short and there's no cross-flow view, and Analytics and the CoE Kit give you macro failure rates, not transaction-level root cause. So I rely on custom telemetry for the rest, which is the gap your framework fills.

    Custom telemetry: yes. The biggest lever at scale is environment-level Application Insights. It captures action-level telemetry across every flow without instrumenting each one, with correlation, KQL, and alerting built in. I pair it with a normalized in-flow log (basically what you're doing) for the business events App Insights won't see, joined on the correlation ID. That's essentially your Tier 3, and it's why framing
    FlowArmor as the generator that feeds App Insights rather than a replacement is the right call.
     
    Surfacing failures across child flows: same as yours, child always returns a structured response, parent inspects it and sets the final status. A child failing without the parent checking is the other big source of false greens, and one correlation ID through the chain is what makes a nested run reconstructable later.
    Scale patterns that held up for me:
    • Standardize the scaffold in a template or solution every flow inherits, so error handling isn't reinvented per flow.
    • Decouple logging asynchronously (as you do) and alert from the central store with thresholds, not per-run emails. The alert-storm-during-an-outage problem you mention is very real.
    • Add retry policies and idempotency on connector-heavy and long-running flows, so transient blips don't become false negatives or duplicate side effects.
    One friendly note: the SharePoint sink is fine for Tier 1, but it's the piece I'd watch as volume climbs (list throttling and the 5,000-item limits hit sooner than expected), so your roadmap to Dataverse then App Insights is the right move. And the planned "multiple logs sharing one CorrelationId" step is what turns this from "which run failed" into "which action in which child failed and why."
    Good contribution. Curious where v2 and the KQL cookbook land.
     
  • Sunil Kumar Pashikanti Profile Picture
    2,111 Moderator on at
     
    Thanks for the thoughtful feedback. I really appreciate you taking the time to break this down from a production perspective.
     
    You captured the intent behind FlowArmor perfectly. It is not trying to replace App Insights at all; it is meant to complement it. While App Insights gives strong platform-level telemetry, the specific business context is usually the missing piece. Bringing that in through structured in-flow logs and tying it all together with a shared Correlation ID is where things start to become highly usable at scale.
     
    I also agree with your warning about SharePoint, though I think it sometimes gets written off a bit too quickly. In practice, many enterprise environments still rely heavily on it due to licensing considerations, existing system-of-record alignment, or simply because teams are deeply invested in it operationally.
     
    When it is treated strictly as an operational point-lookup store rather than an analytics layer, and paired with smart indexing and folder-based partitioning, it can actually hold up reasonably well for low to medium-scale workloads. That said, you are absolutely right that once you move into higher concurrency and deeper observability, App Insights and Dataverse are the definitive long-term directions.
     
    The alert-storm callout resonated as well. Decoupled logging with centralized, threshold-based alerting is really the only model that scales cleanly when things go sideways.
     
    For the next iteration, I am focusing on tightening parent-child flow interaction using that single Correlation ID, and making it easier to plug directly into App Insights for deeper querying. I actually just put together a deeper dive on how far SharePoint can be pushed when designed around these specific threshold behaviors and Power Platform constraints if you want to check it out:
     
     
    Appreciate the great insights again. This kind of real-world perspective is exactly what helps refine the framework!
     

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Season of Sharing Community Challenge Launch!

Jump in, show your community spirit, and win prizes!

Kudos to our 2025 Community Spotlight Honorees

Expanding mentorship, skilling, and AI innovation

Congratulations to the May Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Power Automate

#1
Valantis Profile Picture

Valantis 410

#2
Vish WR Profile Picture

Vish WR 289

#3
David_MA Profile Picture

David_MA 282 Super User 2026 Season 1

Last 30 days Overall leaderboard