English for BullMQ Job Queues
Learn the English vocabulary for BullMQ, the Redis-backed job queue library for Node.js: producers, workers, concurrency, and backoff strategies.
Job-queue bugs are usually reported vaguely — “the job didn’t run” or “it ran twice” — when the real cause is a specific, nameable concept: a missing worker, a retry with no backoff, or a lock that expired mid-processing. Naming these precisely turns a confusing bug report into an actionable one.
Key Vocabulary
Producer — the part of the application that adds jobs to a queue, specifying the job’s name, data payload, and any options like delay or priority.
“The producer adds a job every time an order is placed, but it wasn’t setting a jobId, so retried requests were creating duplicate jobs.”
Worker — a separate process (or the same process running a Worker instance) that pulls jobs off a queue and executes the processing function.
“We had zero workers running in staging, which is why jobs were queuing up but nothing was actually processing them.”
Concurrency — the number of jobs a single worker instance will process in parallel, configured to balance throughput against downstream resource limits. “We capped concurrency at 5 because the downstream API was rate-limiting us when the worker ran 20 jobs in parallel.”
Backoff strategy — the rule that determines how long to wait before retrying a failed job, typically fixed or exponential, used to avoid hammering a struggling downstream service. “We switched from a fixed 1-second backoff to exponential backoff, since retrying immediately during an outage was just adding more load to a service that was already failing.”
Stalled job — a job whose worker stopped renewing its lock (often due to a crash or an event-loop block), causing BullMQ to consider it abandoned and eligible for reprocessing. “The job wasn’t actually failing — the worker was blocking the event loop long enough that BullMQ marked it stalled and handed it to another worker, so it ran twice.”
Common Phrases
- “Is this a producer-side bug, or is the job stuck because no worker is consuming the queue?”
- “What’s the concurrency set to here — is that why the downstream service is getting rate-limited?”
- “Are we using exponential backoff on retries, or hitting the failing service immediately every time?”
- “Was this job actually failing, or did it just stall because the worker’s lock expired?”
- “Is a
jobIdset here to prevent duplicate jobs, or could a retry create a second one?”
Example Sentences
Debugging a duplicate-processing report: “The job ran twice because the worker’s lock renewal was delayed by a long synchronous operation — BullMQ considered it stalled and reassigned it to another worker before the first one finished.”
Explaining a scaling decision: “We’re increasing worker concurrency from 5 to 15 now that the downstream API supports higher throughput, which should clear the backlog within an hour.”
Describing a retry policy in a design doc: “Failed jobs retry with exponential backoff up to 5 attempts, then move to the dead-letter queue for manual review rather than retrying indefinitely.”
Professional Tips
- Distinguish producer and worker issues explicitly — “the queue isn’t working” could mean either, and they require completely different fixes.
- State the concurrency setting when discussing throughput or downstream rate-limiting — it’s usually the actual lever being adjusted.
- Name the backoff strategy rather than saying “it retries” — fixed versus exponential backoff has very different effects during an incident.
- Use stalled precisely, not as a synonym for “failed” — a stalled job points to a worker or lock problem, not a bug in the job’s logic itself.
Practice Exercise
- Explain the difference between a producer and a worker in one sentence.
- Describe why exponential backoff is often preferred over fixed backoff for retries.
- Write a sentence explaining what causes a job to become “stalled.”