Resolved -
By 05:15 UTC, immediate mitigation was completed by scaling up Orchestrate service resources, stabilizing the system and stopping further degradation. Additional safeguards, including payload size limits and an upcoming SDK fix, were prepared to prevent recurrence.
Apr 9, 05:15 UTC
Identified -
At 04:00 UTC on April 9, 2026, degraded performance was observed in the Orchestrate service due to increased system load. The issue was identified as being caused by a retry mechanism repeatedly processing failed jobs. These failures stemmed from oversized payloads (~6 MB) generated in custom actions, which exceeded Kubernetes API server limits, leading to continuous retries and added strain on the system.
Apr 9, 04:00 UTC