Reinforcement Learning Agents

I hope that this repository will be my personal framework for Deep Reinforcement Learning in future. Of course this remains to be seen.

Current WIP:

Entry points to experiments are in experiments folder.

To train a pong agent:

$ python experiments/pong.py --train

To continue training from a model:

$ python experiments/pong.py --train -l checkpoints/step-<step_num>

To evaluate a model:

$ python experiments/pong.py --eval -l checkpoints/step-<step_num>

Use PongDeterministic-v4, since this implemented frame skipping.
Make sure to preprocess the image correctly into 84 x 84 and stack 4 frames into 4 channels.
Learning rate must be as small as 0.00025. I missed a 0 once.
Make sure in target = rewards + self.gamma * expected_v * (1 - done), the (1 - done) factor is there so as to account for rewards of terminal states. We should not add the value of next states (represented by expected_v) if we are already in terminal states.
Make sure in loss = self.criterion(estimated_q, target.unsqueeze(1)) the target and estimated Q value has the same dimensions (might be caused by improper shape of rewards)
Use Kaiming Initialization, since CNN use ReLU activation.
Make sure policy model and target model are not referencing the same model object (deep copy).
Make sure we gather across the right dimension in estimated_q = policy_q.gather(1, actions)