{"id":128,"date":"2024-02-19T03:45:03","date_gmt":"2024-02-19T03:45:03","guid":{"rendered":"https:\/\/tensor.agenthub.uk\/?p=128"},"modified":"2024-04-24T08:45:07","modified_gmt":"2024-04-24T08:45:07","slug":"mean-rewrad-in-subprocvecenv","status":"publish","type":"post","link":"https:\/\/tensorzen.blog\/?p=128","title":{"rendered":"mean rewrad in SubprocVecEnv"},"content":{"rendered":"\n<p>In Stable Baseline3, when using environments like &#8216;SubprocVecEnv&#8217; for parallel environment management, the mean reward isn&#8217;t displayed by default during the training phase. This is because &#8216;SubprocVecEnv&#8217; runs each environment instance in its own subprocess, which make it more efficient for parallel computation but does not automatically report statistics like mean reward during training.<\/p>\n\n\n\n<p>To display the mean reward or other statistics during training, you can utilize Monitor to wrap your custom environment.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#f6f6f4;--cbp-line-number-width:calc(1 * 0.6 * .875rem);line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#282A36\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"def make_env(pid):\n    def _init():\n        env = Monitor(Env(pid=pid, env_id=1, agent=agent, exe_file=exe_file))\n        return env\n    return _init\" style=\"color:#f6f6f4;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki dracula-soft\" style=\"background-color: #282A36\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #F286C4\">def<\/span><span style=\"color: #F6F6F4\"> <\/span><span style=\"color: #62E884\">make_env<\/span><span style=\"color: #F6F6F4\">(<\/span><span style=\"color: #FFB86C; font-style: italic\">pid<\/span><span style=\"color: #F6F6F4\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6F6F4\">    <\/span><span style=\"color: #F286C4\">def<\/span><span style=\"color: #F6F6F4\"> <\/span><span style=\"color: #62E884\">_init<\/span><span style=\"color: #F6F6F4\">():<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6F6F4\">        env <\/span><span style=\"color: #F286C4\">=<\/span><span style=\"color: #F6F6F4\"> Monitor(Env(<\/span><span style=\"color: #FFB86C; font-style: italic\">pid<\/span><span style=\"color: #F286C4\">=<\/span><span style=\"color: #F6F6F4\">pid, <\/span><span style=\"color: #FFB86C; font-style: italic\">env_id<\/span><span style=\"color: #F286C4\">=<\/span><span style=\"color: #BF9EEE\">1<\/span><span style=\"color: #F6F6F4\">, <\/span><span style=\"color: #FFB86C; font-style: italic\">agent<\/span><span style=\"color: #F286C4\">=<\/span><span style=\"color: #F6F6F4\">agent, <\/span><span style=\"color: #FFB86C; font-style: italic\">exe_file<\/span><span style=\"color: #F286C4\">=<\/span><span style=\"color: #F6F6F4\">exe_file))<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6F6F4\">        <\/span><span style=\"color: #F286C4\">return<\/span><span style=\"color: #F6F6F4\"> env<\/span><\/span>\n<span class=\"line\"><span style=\"color: #F6F6F4\">    <\/span><span style=\"color: #F286C4\">return<\/span><span style=\"color: #F6F6F4\"> _init<\/span><\/span><\/code><\/pre><\/div>\n","protected":false},"excerpt":{"rendered":"<p>In Stable Baseline3, when using environments like &#8216;SubprocVecEnv&#8217; for parallel environment management, the mean reward isn&#8217;t displayed by default during the training phase. This is because &#8216;SubprocVecEnv&#8217; runs [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,18],"tags":[],"class_list":["post-128","post","type-post","status-publish","format-standard","hentry","category-coding","category-reinforcement-learning"],"_links":{"self":[{"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/posts\/128","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=128"}],"version-history":[{"count":3,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/posts\/128\/revisions"}],"predecessor-version":[{"id":323,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/posts\/128\/revisions\/323"}],"wp:attachment":[{"href":"https:\/\/tensorzen.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=128"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=128"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=128"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}