In Stable Baseline3, when using environments like ‘SubprocVecEnv’ for parallel environment management, the mean reward isn’t displayed by default during the training phase. This is because ‘SubprocVecEnv’ runs each environment instance in its own subprocess, which make it more efficient for parallel computation but does not automatically report statistics like mean reward during training.

To display the mean reward or other statistics during training, you can utilize Monitor to wrap your custom environment.

def make_env(pid):
    def _init():
        env = Monitor(Env(pid=pid, env_id=1, agent=agent, exe_file=exe_file))
        return env
    return _init

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注