IntermediateVocabulary#ml-language#algorithms#backend

Reinforcement Learning Vocabulary

Learn the vocabulary of an agent improving its policy purely from reward signals received through interaction with an environment.

0 / 5 completed
1 / 5
A teammate explains that a training approach has an agent take actions in an environment and learn purely from reward signals it receives afterward, gradually improving its policy to maximize cumulative future reward, rather than learning from a fixed labeled dataset. What is this approach called?