In Stable Baseline3, when using environments like ‘SubprocVecEnv’ for parallel environment management, the mean reward isn’t displayed by default during the training phase. This is because ‘SubprocVecEnv’ runs each environment instance in its own subprocess, which make it more efficient for parallel computation but does not automatically report statistics like mean reward during training.
To display the mean reward or other statistics during training, you can utilize Monitor to wrap your custom environment.
def make_env(pid):
def _init():
env = Monitor(Env(pid=pid, env_id=1, agent=agent, exe_file=exe_file))
return env
return _init