step_size_epsilon_vs_reward_distribution In this notebook I will: Create bandit algorithm Help you understand the effect of epsilon on exploration and learn about the exploration/exploitation tradeoff Results for different step sizes with constant epsilon 0.1 step_sizes = [0.01, 0.1, 0.5, 1.0, '1/N(A)'] Randomization effect on rewards