Multi-Armed Bandit

Let A is a known set of actions. R_a is a distribution of rewards, given action a. At a timestep t, an agent selects an action a and gets a reward R_t ~ R_a. The goal is to maximize the cumulative rewards.

📝Clint's Notes

Explorer

Multi-Armed Bandit

Graph View

Backlinks