Build fluency in the vocabulary of iteratively stepping a model's parameters opposite the loss function's gradient.
0 / 5 completed
1 / 5
At standup, a dev mentions repeatedly nudging a model's parameters a small step in the direction that most reduces a loss function, scaled by a learning rate, until the loss stops improving. What is this algorithm called?
Gradient descent is exactly this: gradient descent computes the gradient, meaning the direction of steepest increase, of a loss function with respect to a model's parameters, then repeatedly steps every parameter a small amount in the opposite direction, scaled by a learning rate, until the loss stops meaningfully improving. A hash collision is an unrelated hash-table concept about two keys sharing a bucket. This repeated step-opposite-the-gradient approach is exactly why gradient descent can train models with millions of parameters where no closed-form solution exists.
2 / 5
During a design review, the team picks gradient descent to train a model with millions of parameters, specifically because no closed-form formula exists for the optimal parameters, but the loss function's gradient can still be computed cheaply. Which capability does this provide?
Gradient descent here provides Scalable optimization without a closed-form solution, since following the gradient downhill one small step at a time works for any differentiable loss function, no matter how many parameters the model has or how complex the loss landscape is. An approach that requires solving a closed-form equation directly becomes intractable the moment the model has too many parameters or the loss is too complex to solve analytically. This gradient-following behavior is exactly why gradient descent is the standard way to train large models like neural networks.
3 / 5
In a code review, a dev notices a model-tuning feature searches for good parameters by randomly perturbing them and keeping whichever random change happens to reduce the loss, instead of computing the loss function's gradient and stepping opposite it. What does this represent?
This is a missed gradient-descent opportunity, since computing the gradient and stepping directly opposite it would reduce the loss far more reliably and quickly than randomly perturbing parameters and hoping for an improvement. A cache eviction policy is an unrelated concept about discarded cache entries. This random-perturbation pattern is exactly the kind of inefficiency a reviewer flags once the loss function is confirmed differentiable and its gradient can be computed directly.
4 / 5
An incident report shows a model-training job took far longer to converge than expected, because it searched for better parameters by randomly perturbing them and keeping improvements, instead of computing the gradient and stepping opposite it. What practice would prevent this?
Switching to gradient descent replaces blind random perturbation with a directed, computed step toward lower loss. Continuing to randomly perturb parameters and keep whichever change reduces the loss regardless of how large the parameter count grows is exactly what caused the issue described in this incident. This gradient-following approach is the standard fix once the loss function is confirmed differentiable.
5 / 5
During a PR review, a teammate asks why the team reaches for gradient descent instead of solving the closed-form normal equation used for simple linear regression, given that the normal equation gives an exact answer in one step. What is the reasoning?
Gradient descent scales to models with far more parameters than a closed-form equation can practically handle, at the cost of needing a learning rate and multiple iterations to converge, while the normal equation solves small, simple problems exactly in a single step but becomes computationally impractical, since it typically requires inverting a matrix that grows with the parameter count, once the parameter count grows large. This is exactly why gradient descent is favored for large models, while the closed-form equation remains useful for small linear-regression problems.