AdvancedVocabulary#data-science-ml#backend#developer-tools

Gradient Descent Vocabulary

Build fluency in the vocabulary of iteratively stepping a model's parameters opposite the loss function's gradient.

0 / 5 completed

1 / 5

At standup, a dev mentions repeatedly nudging a model's parameters a small step in the direction that most reduces a loss function, scaled by a learning rate, until the loss stops improving. What is this algorithm called?

2 / 5

During a design review, the team picks gradient descent to train a model with millions of parameters, specifically because no closed-form formula exists for the optimal parameters, but the loss function's gradient can still be computed cheaply. Which capability does this provide?

3 / 5

In a code review, a dev notices a model-tuning feature searches for good parameters by randomly perturbing them and keeping whichever random change happens to reduce the loss, instead of computing the loss function's gradient and stepping opposite it. What does this represent?

4 / 5

An incident report shows a model-training job took far longer to converge than expected, because it searched for better parameters by randomly perturbing them and keeping improvements, instead of computing the gradient and stepping opposite it. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team reaches for gradient descent instead of solving the closed-form normal equation used for simple linear regression, given that the normal equation gives an exact answer in one step. What is the reasoning?