-
Notifications
You must be signed in to change notification settings - Fork 289
Matteo's Notes
I have come across two versions of the Noisy Net algorithm for DQN on arXiv: v1) where independent noise is sampled for each transition in the minibatch and v2) where the same noise is used per batch but another set of noise is used to perform action selection for the target network - which one is used in Rainbow/are they interchangeable?
- v2
In the original Distributional RL paper, the loss used is the cross-entropy: -m.log(p(s, a)) . However, the standard KL loss is: m(log(m) - p(s, a)) . Which is used for Rainbow (considering the scaling difference effects Prioritised Experience Replay)?
- cross entropy
Are losses summed or averaged across a minibatch?
- averaged
Could you provide more info on the importance of σ_0 and how noise generation differs for you between CPU and GPU?
- it's not critical in many games, however it can affect quite significantly a handful, and median scores can be quite sensible to games who's performance is close to the median itself. It's something I observed only on some GPUs, placing the noise generation on CPU seems the safer way to go.