-
Notifications
You must be signed in to change notification settings - Fork 193
[Question] Why env_change[batch_inds] is not considered during _get_samples(*) in RecurrentRolloutBuffer? #284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello, |
To refresh my memory, I've created some graphics (I should probably put them in SB3 doc later). First, we collect Optionally, we split randomly the flattened sequences to add a bit of randomness when sampling for multiple epochs: We need to zero-out the hidden state of LSTM whenever a new sequence starts (so episode starts or env change event). Then we take Note: we use the flattened sequence to compute things like advantage but the last operation will be reverted before the mini-batch is fed to the LSTM (which expects |
After looking at it, it should probably be like:
We should also double check the effect of the split on it... |
This looks like a feature, not a bug. We will cut multiple sub-sequences, but only a few will be actual episode starts. For those that are actual starts we will later reset the hidden state (on a first sequence step). For those that aren't starts we will use stored hidden states instead of resetting them on the first seq step. |
❓ Question
When getting batches from a well-collected RecurrentRolloutBuffer, only episode_starts[batch_inds] will be returned to the sequence data. And this "episode_starts" is important for lstm policy to reset the hidden state during the training.
However, I have a question about the behavior here. As the seq_start_indices are decided together by both episode_starts and env_change, why are only episode_starts returned?
To be more clear, why the line 240 in common.recurrent.buffers is like "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds])" instead of "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds] or env_change[batch_inds])"?
Thank you for the explanation in advance.
Checklist
The text was updated successfully, but these errors were encountered: