"One task retried itself 319 times"
A rate limit reset a task to ready with no ceiling. The same task spawned.
And spawned. And spawned — 319 times before anyone noticed the ID in the log.
The retry primitive had no monotonic counter and no terminal state, so "fail" meant "reset to ready," not "stop trying." One upstream outage away from a six-figure invoice. You've shipped this bug too — in a cron job that re-enqueues on error, or a pod that crash-loops at full speed.
The rule: every retry needs a hard ceiling and a real terminal state. Rate-limit errors, auth errors, and logic errors are three different classes — don't share a retry path between them.
Monday checklist → … (full issue in your inbox)