[Question] Why env_change[batch_inds] is not considered during _get_samples(*) in RecurrentRolloutBuffer? #284

Hhannzch · 2025-03-11T14:25:18Z

❓ Question

When getting batches from a well-collected RecurrentRolloutBuffer, only episode_starts[batch_inds] will be returned to the sequence data. And this "episode_starts" is important for lstm policy to reset the hidden state during the training.
However, I have a question about the behavior here. As the seq_start_indices are decided together by both episode_starts and env_change, why are only episode_starts returned?
To be more clear, why the line 240 in common.recurrent.buffers is like "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds])" instead of "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds] or env_change[batch_inds])"?

Thank you for the explanation in advance.

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin · 2025-03-13T15:49:40Z

Hello,
thanks for pointing that out.
Might be a bug.
I need to dig deeper, this code is overly complex even with all the comments 🙈

araffin · 2025-03-31T15:03:23Z

To refresh my memory, I've created some graphics (I should probably put them in SB3 doc later).

First, we collect n_steps * n_envs transitions and then flatten it to be able to sample from it:

Optionally, we split randomly the flattened sequences to add a bit of randomness when sampling for multiple epochs:

We need to zero-out the hidden state of LSTM whenever a new sequence starts (so episode starts or env change event).

Then we take batch_size transitions and reshape by sequences to be able to pad them:

Note: we use the flattened sequence to compute things like advantage but the last operation will be reverted before the mini-batch is fed to the LSTM (which expects (max length, n_seq, features_dim) as input)

araffin · 2025-03-31T15:22:18Z

To be more clear, why the line 240 in common.recurrent.buffers is like "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds])" instead of "episode_starts=self.pad_and_flatten(self.episode_starts[batch_inds] or env_change[batch_inds])"?

After looking at it, it should probably be like:

stable-baselines3-contrib/sb3_contrib/common/recurrent/buffers.py

Line 82 in 00a401d

seq_start = np.logical_or(episode_starts, env_change).flatten()

We should also double check the effect of the split on it...

ar-too · 2025-04-16T14:53:37Z

This looks like a feature, not a bug. We will cut multiple sub-sequences, but only a few will be actual episode starts. For those that are actual starts we will later reset the hidden state (on a first sequence step). For those that aren't starts we will use stored hidden states instead of resetting them on the first seq step.

Hhannzch added the question label Mar 11, 2025

This was referenced Mar 31, 2025

Fix RecurrentRolloutBuffer not taking env_change into account for resetting states #290

Draft

[Question] RecurrentRolloutBuffer samples LSTM states from multiple VecEnv environments in a single sample #285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Why env_change[batch_inds] is not considered during _get_samples(*) in RecurrentRolloutBuffer? #284

[Question] Why env_change[batch_inds] is not considered during _get_samples(*) in RecurrentRolloutBuffer? #284

Hhannzch commented Mar 11, 2025 •

edited

Loading

araffin commented Mar 13, 2025 •

edited

Loading

araffin commented Mar 31, 2025

araffin commented Mar 31, 2025

ar-too commented Apr 16, 2025

[Question] Why env_change[batch_inds] is not considered during _get_samples(*) in RecurrentRolloutBuffer? #284

[Question] Why env_change[batch_inds] is not considered during _get_samples(*) in RecurrentRolloutBuffer? #284

Comments

Hhannzch commented Mar 11, 2025 • edited Loading

❓ Question

Checklist

araffin commented Mar 13, 2025 • edited Loading

araffin commented Mar 31, 2025

araffin commented Mar 31, 2025

ar-too commented Apr 16, 2025

Hhannzch commented Mar 11, 2025 •

edited

Loading

araffin commented Mar 13, 2025 •

edited

Loading