One of my projects at work uses the Python package Celery with Redis to manage executing background tasks. And we ran into some odd behavior that we didn't see explained anywhere else, so I figure I'll capture it here for the next poor soul running into these issues.
First, if you care about this subject, you should read this post over at Instawork which is a good discussion of the risks involved in using
eta. It helps set the stage.
We're using Celery with a Redis broker as part of a Django application. We apply one of 3 priorities to each of our tasks: Low, Medium, and High. High-priority tasks represent things that a human user is waiting on and need to be completed as soon as possible. Low-priority tasks are things that need to happen eventually, but we don't really care when. And anything else gets configured as medium priority.
This set up worked in our validation testing. We saw the queues get loaded up in Redis and the workers execute tasks in priority order as expected.
The Wrong Queue
After a large-scale data-processing task we noticed that high-priority user tasks were not executing.
When I inspected the queues in Redis I found that the high-priority queue was full of low-priority tasks. So the workers were extremely busy (correctly) processing the queue, but the tasks they were running were low priority. And the human's task was stuck behind them all.
How did this happen?
The first part of the puzzle is how Celery handles
countdown allows you to say "execute the task 5 minutes from now" while
eta allows you to say "execute the task no earlier than March 10, 2023 at 10:08AM."
countdown is purely syntactic sugar for
eta so that you don't have to calculate actual times yourself, so when you call
apply_async with a
countdown parameter Celery converts it to an
eta parameter. Since internally Celery only concerns itself with
eta values we'll only talk in terms of
eta from this point on.
eta task is generated it gets put into the appropriate priority queue. But, it doesn't stay in the queue until its
eta passes. Instead, any worker checking the queue will reserve the task immediately and hold it internally until the
During my investigation, while the workers were idle, I scheduled a few hundred tasks with an
eta and, as a result of the above behavior, the priority queues in Redis were empty. Workers will continue reserving
eta tasks from queues until they have a task that needs to actually execute now. Once the workers are busy,
eta tasks will stay in their appropriate queues until a worker is freed up and comes looking again.
Processing Reserved Tasks
Alright, so our workers have reserved all of our
eta tasks with varying priorities and now the tasks are starting to pass their
etas and need to be executed. At this point the worker completely ignores the priorities on the tasks. It begins executing whichever reserved task it happens upon first in its internal data structure (this is probably an internal queue, but I don't know for sure).
So once a worker reserves a task its priority is no longer respected. If you schedule a few hundred
eta tasks with mixed priorities (as I did) you see them executed in what appears to be an arbitrary order (I suspect they're actually executed in order they were reserved, but I haven't verified that because it's not relevant to my concerns).
This is not good and reason enough to avoid using
eta tasks for anything but high-priority tasks. But, it doesn't explain how we ended up with low-priority tasks in our high-priority queue in Redis.
Death of a Celery Worker
We have Celery configured to replace each worker after completing 10 tasks. This was an attempt to work around an issue where workers would stop pulling tasks from the queue and everything would stall out. We had a hypothesis that the issue was unclosed connections to Redis and so replacing the workers would force unclosed connections to get cleaned up. We haven't yet verified what was actually happening or if replacing the workers fixed anything though. It's a very intermittent problem and we haven't identified a sure trigger. (Though we did solidly identify that if Redis isn't ready to serve connection when Celery starts then Celery will not reconnect properly and workers will only execute a single task before hanging forever.)
Anyway, the point is that we replace our workers every so often. Well, what happens to all those
eta tasks the worker had reserved? They go back in the queue so another worker can get them. But, they all go into the default queue instead of going back to their appropriate priority queues. It happens that the default queue is the high-priority queue. So each time a worker was getting cycled, all the
eta tasks it held were pushed into the high-priority queue.
The perfect storm
So here's the scenario. Our workers are sitting idle waiting for work to do. Our large-scale data-processing task schedules a bunch of low-priority jobs with an
eta. The workers eagerly snap up all these tasks and reserve them for future execution. As soon as the earliest
etas pass each worker begins executing and stops reserving more
eta tasks from the low-priority queue.
Each worker completes 10 tasks and gets replaced. As each worker is replaced it returns the remainder of its eagerly-reserved
eta tasks to the high-priority queue. The new workers being spawned now begin processing the high-priority queue since it's full of tasks.
A user comes along and engages in an action backed by a high-priority task. But the user's high-priority task is now stuck behind several thousand low-priority tasks that have been misplaced in the high-priority queue.
Moral of the story
We had run across the
countdown parameter to
apply_async and thought it would be a good way to avoid some unnecessary work by pairing some high-churn jobs with a flag so they'd only be scheduled again if they weren't already scheduled (this flag was managed outside of the Celery world).
We will be rolling back that change so as to avoid this situation in the future.