English for Weights & Biases Developers
Learn the English vocabulary for Weights & Biases (wandb): runs, sweeps, artifacts, and experiment comparison.
Weights & Biases conversations use a specific vocabulary for tracking experiments — run, sweep, artifact — that’s easy to mix up with generic ML terms, and calling every logged experiment a “run” without distinguishing sweeps from artifacts makes it harder to describe exactly what’s being compared or reproduced.
Key Vocabulary
Run — a single execution of a training or evaluation script, logged with its configuration, metrics, and outputs, forming the basic unit that everything else in W&B is organized around. “Compare this run against last week’s baseline run — the config diff should show exactly what changed between them.”
Sweep — an automated search over hyperparameter combinations, where W&B launches multiple runs according to a defined strategy like grid, random, or Bayesian search. “We’re using a Bayesian sweep to tune learning rate and batch size together instead of guessing values manually.”
Artifact — a versioned collection of files, such as a dataset, model checkpoint, or evaluation output, tracked with lineage so you can trace exactly which run produced or consumed it. “Log the trained model as an artifact so the next run can reference this exact version instead of a loosely named checkpoint file.”
Panel / dashboard — a customizable visualization, such as a metric chart or table, arranged into a workspace that lets a team compare multiple runs side by side. “Add a panel comparing validation loss across all three runs so we can see the overfitting point without opening each run individually.”
Lineage — the tracked chain of dependencies between artifacts and runs, showing which dataset produced which model and which model was used in which evaluation. “Lineage shows this model was trained on last month’s dataset version, which explains why it’s missing the newly added examples.”
Common Phrases
- “Is this comparison across runs from the same sweep, or are the configs actually different?”
- “Did we log the checkpoint as an artifact, or is it just sitting in a local folder somewhere?”
- “What does the lineage show for this model — which dataset version was it actually trained on?”
- “Should we run a sweep here, or do we already know roughly where the good hyperparameters are?”
- “Can we add a panel that overlays training and validation loss for these three runs?”
Example Sentences
Reporting a hyperparameter search result: “The sweep found that a smaller batch size with a higher learning rate outperformed our manual baseline run by about three points on validation accuracy.”
Explaining a reproducibility fix: “We started logging every checkpoint as a versioned artifact, so now we can trace lineage back from a deployed model to the exact dataset and code version that produced it.”
Reviewing an experiment dashboard: “This panel makes it obvious — the run with data augmentation enabled has a much smaller gap between training and validation loss than the others.”
Professional Tips
- Say run consistently for a single execution rather than “experiment” or “job” interchangeably — precision here makes it easier to reference a specific one later.
- Distinguish a sweep from a batch of manually launched runs — a sweep implies an automated search strategy, which matters when discussing how thorough the hyperparameter search actually was.
- Log outputs as artifacts rather than describing them as “files” — the versioning and lineage tracking are the point, and naming it correctly signals that reproducibility was considered.
- Reference lineage directly when debugging a model behavior regression — it’s often faster to trace the artifact chain than to re-run experiments from scratch.
Practice Exercise
- Explain the difference between a run and a sweep in one sentence.
- Describe what makes an artifact different from just saving a file to disk.
- Write a sentence explaining how lineage would help debug an unexpected model regression.