English for Ray Distributed Compute

Learn the English vocabulary for Ray: tasks, actors, the object store, and cluster autoscaling for distributed Python.

Ray discussions require precise vocabulary to distinguish stateless parallelism from stateful distributed objects — tasks versus actors — and using the two terms interchangeably in a design conversation obscures a real architectural decision about where state lives.

Key Vocabulary

Task — a stateless, remotely executed Python function, submitted with @ray.remote and .remote(), that Ray schedules onto any available worker without retaining memory of previous calls. “Make this a plain task, not an actor — each call is independent and doesn’t need to remember anything from the last one.”

Actor — a stateful, remotely executed Python class instance that persists across multiple method calls, letting you maintain state like a loaded model or an open connection across a distributed cluster. “We wrapped the model in an actor so it only loads once per worker, instead of every task reloading the weights from scratch.”

Object store — Ray’s shared-memory store for large objects passed between tasks and actors, which avoids expensive serialization and copying when data is reused across multiple calls on the same node. “Put that large array in the object store once and pass the reference around, instead of serializing and re-sending it with every task call.”

Ray reference (ObjectRef) — a future-like handle returned immediately by a remote call, representing a result that may not have been computed yet, resolved later with ray.get(). “Don’t call ray.get() right after submitting each task — collect all the object refs first, then resolve them together so the tasks actually run in parallel.”

Cluster autoscaling — Ray’s mechanism for adding or removing worker nodes based on current resource demand, so a workload can scale out during a burst of tasks and scale back down when idle. “With autoscaling enabled, this batch job will spin up extra workers during the peak and release them once the queue drains, instead of us provisioning a fixed cluster size.”

Common Phrases

  • “Does this need to be an actor, or is a stateless task enough here?”
  • “Is this large object going through the object store, or is it being re-serialized on every call?”
  • “Are we calling ray.get() too early and accidentally serializing these tasks?”
  • “Is the cluster autoscaling correctly, or is it stuck at minimum capacity under load?”
  • “Which node is this actor pinned to, and does that create a bottleneck?”

Example Sentences

Explaining a design choice: “We used an actor here specifically because the model weights are expensive to load — a task would reload them on every single call, which actors avoid by keeping state resident.”

Diagnosing a performance issue: “This loop is accidentally sequential because we’re calling ray.get() inside it right after each .remote() call — we need to submit all the tasks first and gather the results afterward.”

Reviewing a scaling incident: “The job stalled because cluster autoscaling didn’t kick in fast enough for the traffic spike — we’re adjusting the scale-up threshold and pre-warming a small buffer of workers.”

Professional Tips

  • Distinguish task from actor explicitly in design discussions — the choice determines whether state lives across calls, and conflating them leads to confusing debugging later.
  • Name the object store when discussing data-passing performance — “it’s slow to pass this array around” is vague, while “it’s being re-serialized instead of using the object store” points at the fix.
  • Flag premature ray.get() calls specifically, referencing the ObjectRef by name, when reviewing code that should be parallel but isn’t — it’s one of the most common Ray anti-patterns.
  • Mention cluster autoscaling thresholds explicitly when discussing burst workloads — silent under-scaling is easy to misdiagnose as “Ray is slow” when it’s actually a scaling-policy issue.

Practice Exercise

  1. Explain the difference between a task and an actor in one sentence.
  2. Describe what the object store avoids that plain serialization between calls would require.
  3. Write a sentence explaining why calling ray.get() too early can accidentally serialize otherwise-parallel work.