mean rewrad in SubprocVecEnv

By tensorzen 2024年2月19日

In Stable Baseline3, when using environments like ‘SubprocVecEnv’ for parallel environment management, the mean reward isn’t displayed by default during the training phase. This is because ‘SubprocVecEnv’ runs each environment instance in its own subprocess, which make it more efficient for parallel computation but does not automatically report statistics like mean reward during training.

To display the mean reward or other statistics during training, you can utilize Monitor to wrap your custom environment.

def make_env(pid):
    def _init():
        env = Monitor(Env(pid=pid, env_id=1, agent=agent, exe_file=exe_file))
        return env
    return _init

mean rewrad in SubprocVecEnv

By tensorzen

发表回复取消回复

You Missed

Step by Step实现RAG

timeScale vs fixedDeltaTime

Difference between Gradient and Derivative

Fixed update with Physics.Simulate in Unity

By tensorzen

Related Post

发表回复 取消回复

You Missed

发表回复取消回复