Matteo's Notes

Notes from Matteo Hessel based on personal correspondence:

I have come across two versions of the Noisy Net algorithm for DQN on arXiv: v1) where independent noise is sampled for each transition in the minibatch and v2) where the same noise is used per batch but another set of noise is used to perform action selection for the target network - which one is used in Rainbow/are they interchangeable?

v2

In the original Distributional RL paper, the loss used is the cross-entropy: -m.log(p(s, a)) . However, the standard KL loss is: m(log(m) - p(s, a)) . Which is used for Rainbow (considering the scaling difference effects Prioritised Experience Replay)?

cross entropy

Are losses summed or averaged across a minibatch?

averaged

Could you provide more info on the importance of σ_0 and how noise generation differs for you between CPU and GPU?

it's not critical in many games, however it can affect quite significantly a handful, and median scores can be quite sensible to games who's performance is close to the median itself. It's something I observed only on some GPUs, placing the noise generation on CPU seems the safer way to go.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matteo's Notes

Notes from Matteo Hessel based on personal correspondence:

Clone this wiki locally