step_size_epsilon_vs_reward_distribution

In this notebook I will:

Create bandit algorithm
Help you understand the effect of epsilon on exploration and learn about the exploration/exploitation tradeoff

Results for different step sizes with constant epsilon 0.1

step_sizes = [0.01, 0.1, 0.5, 1.0, '1/N(A)']

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Images		Images
README.md		README.md
RL1.ipynb		RL1.ipynb