Function executions are currently degraded
Resolved
Mar 05 at 05:15pm CET
Functions recovered. Post mortem:
Post-Incident Summary
Date: 6 March 2026
Impact: Degraded function execution for actions and webhooks
Status: Resolved
Summary
A bug in the task scheduling system disabled per-environment concurrency limits, allowing a single tenant to generate an unbounded burst of invocations. At the same time, the tenant’s functions were significantly longer running than typical workloads (120–150 seconds), which caused execution environments to remain occupied for extended periods.
As the burst of tasks exceeded the rate at which execution environments became available, a backlog formed in the asynchronous invocation queue. This backlog increased the age of queued events and introduced elevated action latency. Eventually, queued tasks began exceeding the expiration limits enforced by the task system and expired before execution.
Monitoring detected the issue through elevated latency metrics, after which the workload was identified and mitigated.
Timeline (CET)
- Issue began: 16:00
- Detected by monitoring / on-call paged: 16:20
- Mitigated: 17:15
Root Cause
A bug in the task system disabled per-environment concurrency limits, allowing a single tenant environment to generate an unbounded burst of invocations. The tenant’s functions were also significantly longer running than typical workloads (120–150 seconds), which meant execution environments remained occupied for extended periods and did not recycle quickly.
Because provisioned concurrency was configured with a low maximum, most of the burst traffic was handled by on-demand capacity. While the function runtime continued scaling additional execution environments, the combination of burst traffic and long-running executions caused capacity to ramp more slowly than the incoming workload required.
This created a backlog in the asynchronous invocation queue, which increased async event age and action latency. As the backlog grew, queued tasks eventually exceeded the expiration limits enforced by the task system and expired before they could be executed.
Detection was delayed because alerting relies on latency averaged over 15-minute windows.
Resolution
- The source tenant generating the burst workload was identified and incoming traffic was halted.
- Task system throttling logic was fixed to restore per-environment concurrency limits.
- After the workload was stopped and backlog drained, function processing returned to normal.
All systems were fully operational by 17:15 CET.
Follow-Up Actions
System safeguards
- Fix the task system throttling bug to ensure per-environment concurrency limits are always enforced (completed).
Capacity & scaling
- Increase provisioned concurrency autoscaling limits to better absorb bursts and reduce cold-start spillover.
Monitoring & alerting
- Add alerts for AsyncEventAge to detect queue backlogs earlier.
- Add alerts for ProvisionedConcurrencySpilloverInvocations.
- Add anomaly detection on actions executed per minute / throughput drops.
Affected services
Updated
Mar 05 at 05:13pm CET
We're implementing a fix that will take effect shortly. Still investigating the root cause.
Affected services
Created
Mar 05 at 04:05pm CET
We are experiencing an issue executing functions on Nango (actions, syncs, and webhooks). We are currently investigating and will provide updates here.
Affected services