Let A is a known set of actions. Ra is a distribution of rewards, given action a. At a timestep t, an agent selects an action a and gets a reward Rt ~ Ra. The goal is to maximize the cumulative rewards.
Search
Jun 06, 20251 min read
Let A is a known set of actions. Ra is a distribution of rewards, given action a. At a timestep t, an agent selects an action a and gets a reward Rt ~ Ra. The goal is to maximize the cumulative rewards.