{"id":280,"date":"2021-12-30T08:11:00","date_gmt":"2021-12-30T08:11:00","guid":{"rendered":"https:\/\/tensor.agenthub.uk\/?p=280"},"modified":"2024-05-22T12:19:40","modified_gmt":"2024-05-22T12:19:40","slug":"%e8%bf%99%e5%a4%a7%e6%a6%82%e6%98%afgbdt%e6%9c%80%e5%88%9d%e7%9a%84%e6%83%b3%e6%b3%95","status":"publish","type":"post","link":"https:\/\/tensorzen.blog\/?p=280","title":{"rendered":"\u8fd9\u5927\u6982\u662fGBDT\u6700\u521d\u7684\u60f3\u6cd5"},"content":{"rendered":"\n<p>\u3010\u6587\u7ae0\u6700\u521d\u53d1\u8868\u4e86\u6211\u4e2a\u4eba\u7684\u516c\u4f17\u53f7\u4e0a<a href=\"https:\/\/mp.weixin.qq.com\/s?__biz=MzU3NTEzNDY1Mg==&amp;mid=2247483999&amp;idx=1&amp;sn=0f7f96675118320d12825bc5a9bce758&amp;chksm=fd268c8cca51059a7b89ba4e7074df3b70e6e6aa009d335d4651c7847f1d3ba14d5ff2a87adf#rd\" data-type=\"link\" data-id=\"https:\/\/mp.weixin.qq.com\/s?__biz=MzU3NTEzNDY1Mg==&amp;mid=2247483999&amp;idx=1&amp;sn=0f7f96675118320d12825bc5a9bce758&amp;chksm=fd268c8cca51059a7b89ba4e7074df3b70e6e6aa009d335d4651c7847f1d3ba14d5ff2a87adf#rd\">\u70b9\u51fb\u67e5\u770b<\/a>\uff0c\u5206\u4e86\u4e0a\u4e0b\u4e24\u7bc7\uff0c\u8fd9\u91cc\u91cd\u65b0\u6574\u7406\u4e86\u4e0b\uff0c\u5e76\u628a\u8868\u8ff0\u4e0d\u6e05\u6670\u7684\u5730\u65b9\u505a\u4e86\u4fee\u6539\u3002\u3011<\/p>\n\n\n\n<p>GBDT(Gradient Boosting Decision Tree)\u68af\u5ea6\u63d0\u5347\u51b3\u7b56\u6811\uff0c\u770b\u5230Gradient\u4e0d\u514d\u4f1a\u60f3\u5230\u68af\u5ea6\u4e0b\u964d\uff08\u4e0a\u5347\uff09\uff0c\u6240\u4ee5\u6211\u4eec\u4ece\u68af\u5ea6\u4e0b\u964d\u5f00\u59cb\u804a\u3002<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">\u68af\u5ea6\u4e0b\u964d<\/h2>\n\n\n\n<p>\u8fd9\u91cc\u6709\u4e00\u6ce2\u6570\u636e$x = \\{x_1, x_2, x_3, &#8230;, x_n \\}, y = {y_1, y_2, y_3, &#8230;, y_n}$, \u5047\u8bbe$x$\u548c$y$\u7684\u5173\u7cfb\u662f<\/p>\n\n\n\n<p>$$y = f(x) + \\epsilon$$ <\/p>\n\n\n\n<p>\u4e8e\u662f$\\hat{y}_i = f(x_i)$,\u5176\u4e2d$\\hat{y}_i$\u662f\u9884\u6d4b\u503c\uff0c\u673a\u5668\u5b66\u4e60\u7684\u8fc7\u7a0b\u5c31\u662f\u901a\u8fc7\u6570\u636e\u96c6$[\\textbf{x}, y]$\u6765\u786e\u5b9a$f(x)$\u7684\u8fc7\u7a0b\u3002\u6211\u4eec\u4f1a\u7528\u4e00\u4e2a\u635f\u5931\u51fd\u6570\u6765$L(y, f(x:\\theta))$\u6765\u63cf\u8ff0\u5b9e\u9645\u503c\u8ddf\u5047\u8bbe\u7684\u51fd\u6570$f(x)$\u8ba1\u7b97\u51fa\u7684\u9884\u6d4b\u503c\u7684\u5dee\u5f02\uff0c\u4e8e\u662f\u6211\u4eec\u8ba4\u4e3a\u5728\u6570\u636e\u96c6$[\\textbf{x},y]$\u4e0a\uff0c\u4f7f\u5f97\u635f\u5931\u51fd\u6570\u6700\u5c0f\u5316\u7684\u90a3\u4e2a$f(x)$\u5c31\u662f\u6211\u4eec\u8981\u627e\u7684\u6700\u5408\u9002\u7684\u51fd\u6570\uff0c\u6807\u8bb0\u6210$f^*(x)$\uff0c\u6570\u5b66\u63cf\u8ff0<\/p>\n\n\n\n<p>$$F^* = \\arg \\min_{F} E_{y,x} L(y, F(x)) = \\arg \\min_{F}E_{x}[E_{y}(L(y, F(x))) | x]$$<\/p>\n\n\n\n<p>\u4e0a\u9762\u7528$F(x)$\u6765\u8868\u793a\u66f4\u5e7f\u4e49\u7684\u51fd\u6570\uff0c$f(x)$\u628a\u5b83\u770b\u6210\u6570\u5b66\u4e16\u754c\u4e2d\u67d0\u4e2a\u5177\u4f53\u7684\u51fd\u6570\u65b9\u4fbf\u8ba8\u8bba\u540e\u7eed\u95ee\u9898\u3002\u6c42\u89e3\u4e0a\u8ff0\u95ee\u9898\uff0c\u6211\u4eec\u4e00\u822c\u4f1a\u91c7\u7528\u68af\u5ea6\u4e0b\u964d\uff0c\u5982\u679c\u8fed\u4ee3M\u6b21\uff0c\u627e\u5230\u6700\u4f18\u53c2\u6570$\\theta^{*}$\u7684\u8fc7\u7a0b\u53ef\u4ee5\u63cf\u8ff0\u4e3a<\/p>\n\n\n\n<p>$$\\theta^{*} = \\theta_{0} + \\sum_{i=1}^{M} &#8211; \\alpha_{i}\\nabla F(x; \\theta_{i-1})$$<\/p>\n\n\n\n<p>\u5176\u4e2d$-\\nabla F(x; \\theta_{i-1})$\u662f\u8d1f\u68af\u5ea6\u65b9\u5411\uff0c$\\alpha_i$\u662f\u6b65\u957f\uff0c\u6211\u4eec\u628a\u516c\u5f0f\u5199\u7684\u66f4\u901a\u7528\u4e00\u70b9\uff0c\u5bf9\u4e8e\u8fed\u4ee3\u4f18\u5316\u95ee\u9898\uff0c\u83b7\u5f97\u7cfb\u6570$\\theta^{*}$\u7684\u8fc7\u7a0b\u662f<\/p>\n\n\n\n<p>$$\\theta^{*} = \\sum_{i=0}^{M}\\theta_{i}$$<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">GB\u90e8\u5206<\/h2>\n\n\n\n<p>\u4e0a\u9762\u6211\u4eec\u5728\u8ba8\u8bba\u7684\u68af\u5ea6\u4e0b\u964d\u7684\u8fc7\u7a0b\u4e2d\uff0c\u51fd\u6570$f(x)$\u7684\u53c2\u6570\u4e2a\u6570\uff0c\u51fd\u6570\u7684\u5f62\u5f0f\u6211\u4eec\u4e8b\u5148\u662f\u77e5\u9053\u7684\uff0c\u6240\u4ee5\u53ef\u4ee5\u4f7f\u7528\u68af\u5ea6\u4e0b\u964d\u5bf9\u53c2\u6570$\\theta$\u8fdb\u884c\u8fed\u4ee3\uff0c\u5373\u6240\u8c13\u7684\u53c2\u6570\u65b9\u6cd5(parameteric method)\u3002 \u73b0\u5728\u5982\u679c\u6211\u4eec\u628a\u6574\u4e2a\u51fd\u6570$f(x)$\u5f53\u4f5c\u672a\u77e5\u53c2\u6570\uff0c\u51fd\u6570\u7684\u5f62\u5f0f\u3001\u53c2\u6570\u4e2a\u6570\u90fd\u662f\u672a\u77e5\u7684\uff0c\u7528\u6570\u636e\u96c6$[\\textbf{x}, y]$\u6765\u627e\u8fd9\u4e2a\u51fd\u6570\u7684\u8fc7\u7a0b\uff0c\u4e5f\u5c31\u662f\u6240\u8c13\u975e\u53c2\u6570\u65b9\u6cd5(no-parametric method)\uff0c\u4e3a\u4e86\u65b9\u4fbf\u6807\u8bb0\uff0c\u6211\u4eec\u628a\u6574\u4e2a\u51fd\u6570\u6807\u8bb0$F(x)$\uff0c\u4e8e\u662f\u6211\u4eec\u9884\u4f30\u503c\u662f<\/p>\n\n\n\n<p>$$\\hat{y} = F(x)$$<\/p>\n\n\n\n<p>\u57fa\u4e8e\u68af\u5ea6\u4e0b\u964d\u7684\u8ba1\u7b97\u903b\u8f91\u6211\u4eec\u7c7b\u6bd4\u8fed\u4ee3\u6c42\u89e3$F(x)$\u7684\u8fc7\u7a0b\uff0c\u6700\u4f18\u51fd\u6570$F^*(x)$\u5982\u679c\u8fed\u4ee3$M$\u6b21\u83b7\u5f97\uff0c\u90a3\u4e48\u5b83\u662f<\/p>\n\n\n\n<p>$$F^{*}(x) = \\sum_{m=0}^{M}f_{m}(x)$$<\/p>\n\n\n\n<p>\u8fd9\u91cc\u7684$f_{m}(x)$\u6211\u4eec\u770b\u4f5c\u662f\u6bcf\u6b21\u8fed\u4ee3\u66f4\u65b0\u7684\u90a3\u4e00\u90e8\u5206\uff0c\u662f\u4e0d\u662f\u6709\u90a3\u5473\u4e86\uff0cGBDT\u4ece\u5f62\u5f0f\u4e0a\u770b\u5c31\u957f\u8fd9\u4e2a\u6837\u3002\u4e0a\u8ff0\u5f62\u5f0f\u5176\u5b9e\u4e5f\u662f\u52a0\u6cd5\u6a21\u578b\uff0c\u6211\u4eec\u6309ESL\u7684\u6807\u8bb0\uff0c\u52a0\u6cd5\u6a21\u578b\u53ef\u4ee5\u5199\u6210<\/p>\n\n\n\n<p>$$Y = \\alpha + \\sum_{j=1}^{p}f_{j}(X_j) + \\epsilon$$ <\/p>\n\n\n\n<p>\u52a0\u6cd5\u6a21\u578b\u7684\u8868\u8fbe\u80fd\u529b\u592a\u5f3a\u4e86\uff0c\u4e0d\u505a\u9650\u5236\u5730\u5b66\u4e60\u4e0b\u53bb\uff0c\u53ef\u4ee5\u4f7f\u7ecf\u9a8c\u635f\u5931\u51fd\u6570\u8d8b\u8fd1\u7406\u8bba\u6700\u5c0f\u503c\uff0c\u4ece\u800c\u5bfc\u81f4over-fitting\uff0c\u6240\u4ee5\u6211\u4eec\u770b\u5230GBDT\u7684\u5404\u79cd\u5b9e\u73b0XGBoost\u3001LightGBM\u90fd\u5728\u63a7\u5236\u8fc7\u62df\u5408\u65b9\u9762\u505a\u4e86\u5f88\u591a\u5de5\u4f5c\u3002\u4eff\u7167\u68af\u5ea6\u4e0b\u964d\u7684\u8fc7\u7a0b\uff0c\u6c42\u89e3$F^{*}(x)$\u7684\u8fc7\u7a0b\u53ef\u4ee5\u5199\u6210\u5982\u4e0b\u5f62\u5f0f<\/p>\n\n\n\n<p>$$F^{*}(x) = f_{0}(x) + \\sum_{m=1}^{M}f_{m}(x)$$<\/p>\n\n\n\n<p>\u5176\u4e2d$f_{0}(x)$\u662f\u521d\u59cb\u503c\uff0c\u5b83\u5e94\u8be5\u662f\u7ecf\u8fc7\u5bf9\u5168\u5c40\u7684\u8003\u8651\u4e4b\u540e\u7ed9\u5b9a\u7684\u521d\u59cb\u731c\u6d4b\uff0c\u5f80\u5f80\u521d\u59cb\u503c\u7ed9\u7684\u597d\uff0c\u53ef\u4ee5\u7a33\u51c6\u5feb\u5730\u627e\u5230\u6700\u4f18\u89e3\u3002<\/p>\n\n\n\n<p>\u516c\u5f0f\u53f3\u8fb9\u5269\u4e0b\u7684$\\sum_{m=1}^{M}f(x)$\uff0c\u6309\u68af\u5ea6\u4e0b\u964d\u8fc7\u7a0b\u5199\u6cd5\u5206\u6210\u4e24\u90e8\u5206<\/p>\n\n\n\n<p>$$f_{m}(x) = -\\rho_{m}g_{m}(x)$$<\/p>\n\n\n\n<p>$-g_{m}(x)$\u662f\u5f53\u524d\u8fed\u4ee3\u7528\u5230\u7684\u8d1f\u68af\u5ea6\u65b9\u5411\uff0c\u5b83\u662f\u635f\u5931\u51fd\u6570$\\Phi(F(x))$\u5bf9\u672a\u77e5\u53d8\u91cf$F(x)$\u7684\u504f\u5bfc\u6570\uff0c\u56e0\u4e3a\u73b0\u5728\u6211\u4eec\u628a$F(x)$\u6574\u4e2a\u770b\u6210\u662f\u4e00\u4e2a\u672a\u77e5\u53d8\u91cf\uff0c\u6839\u636e\u5bfc\u6570\u5b9a\u4e49<\/p>\n\n\n\n<p>$$\\frac{\\partial L}{\\partial F_{m-1}(x)} = \\lim_{f_{m}(x) \\rightarrow 0} \\frac{L(y, F_{m-1}(x) + f_m(x)) &#8211; L(y, F_{m-1}(x)) }{f_m(x)}$$<\/p>\n\n\n\n<p>\u4e8e\u662f$g_{m}(x)$\u662f\u635f\u5931\u51fd\u6570\u5bf9\u672a\u77e5\u53d8\u91cf$\\textbf{F}(x)$\u7684\u5bfc\u6570\uff0c\u800c\u5f53\u524d\u7684$\\textbf{F}(x)$\u662f$F_{m-1}(x)$\uff0cRegression Tree\u91c7\u7528\u7684\u635f\u5931\u51fd\u6570\u662fSquare Error<\/p>\n\n\n\n<p>$$L(y, F(x)) = \\frac{1}{2}||y &#8211; F(x)||^2$$<\/p>\n\n\n\n<p>\u6240\u4ee5\u7b97\u51fa\u6765\u7684\u68af\u5ea6\u662f$\\hat{y}-y$\uff0c\u5373\u5229\u7528$F_{m+1}(x)$\u7ed9\u51fa\u7684\u9884\u6d4b\u503c\u51cf\u53bb\u76ee\u6807\u503c\u5c31\u662f$g_m$\uff0c\u5f53\u7136\u635f\u5931\u51fd\u6570\u8fd8\u6709\u5f88\u591a\u6bd4\u5982\u8fd8\u6709MAE, corss entropy\u7b49\uff0c\u5728\u53e6\u4e00\u7bc7\u6587\u7ae0\u4f1a\u5355\u72ec\u4ecb\u7ecd\u8fd9\u4e9b\u4e0d\u540c\u7684\u635f\u5931\u51fd\u6570\u3002<\/p>\n\n\n\n<p>$g_m(x)$\u786e\u5b9a\u4ee5\u540e\uff0c\u8fd8\u6709\u53e6\u4e00\u4e2a\u53c2\u6570$\\rho_{m}$\uff0c\u5b83\u8be5\u5982\u4f55\u786e\u5b9a\u5462\uff1f\u5047\u5982\u7b2c$m$\u8f6e\u7684\u57fa\u5b66\u4e60\u5668$h(x, a_m)$\uff0c\u5176\u4e2d$a_m$\u662f\u8fd9\u4e2a\u5b66\u4e60\u5668\u7684\u53c2\u6570\uff0c\u5bf9\u4e8eGBDT\u6765\u8bf4\uff0c\u57fa\u5b66\u4e60\u5668\u662f\u4e00\u4e2a\u68f5Regression Tree\uff0c\u6240\u4ee5$a_m$\u662f\u8fd9\u68f5\u6811\u7684\u5212\u5206\u548c\u5212\u5206\u7684\u5e73\u5747\u503c\u3002\u6b65\u957f$\\rho_m$\u51b3\u5b9a\u4e86\u5728\u8d1f\u68af\u5ea6\u65b9\u5411\u4e0a\u884c\u8fdb\u7684\u8ddd\u79bb\u624d\u80fd\u4f7f\u672c\u8f6e\u8fc7\u540e\u7684\u635f\u5931\u6700\u5c0f\uff0c\u6570\u5b66\u63cf\u8ff0<\/p>\n\n\n\n<p>$$\\rho_m = \\arg \\min_{p} \\sum_{i=1}^{N} L(y_i, F_{m-1}(x_i) + \\rho h(x_i;a_m))$$<\/p>\n\n\n\n<p>\u770b\u7740\u8fd8\u633a\u590d\u6742\u7684\uff0c\u5b9e\u9645\u4e0a$F_{m-1}(x+i)$\u548c$h(x_i;a_m)$\u5df2\u7ecf\u77e5\u9053\u4e86\uff0c\u635f\u5931\u51fd\u6570\u91c7\u7528square error\uff0c\u662f\u53ef\u5bfc\u7684\uff0c\u90a3\u4e48$\\rho_m$\u662f\u53ef\u4ee5\u5f97\u5230\u89e3\u6790\u89e3\u7684\u3002<\/p>\n\n\n\n<p>\u4e8e\u662fGradient Boosting\u7684\u8fc7\u7a0b\u5982\u4e0b\uff1a<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"475\" src=\"https:\/\/tensorzen.blog\/wp-content\/uploads\/2024\/04\/image-1024x475.png\" alt=\"\" class=\"wp-image-358\" style=\"width:660px;height:auto\" srcset=\"https:\/\/tensorzen.blog\/wp-content\/uploads\/2024\/04\/image-1024x475.png 1024w, https:\/\/tensorzen.blog\/wp-content\/uploads\/2024\/04\/image-300x139.png 300w, https:\/\/tensorzen.blog\/wp-content\/uploads\/2024\/04\/image-768x356.png 768w, https:\/\/tensorzen.blog\/wp-content\/uploads\/2024\/04\/image.png 1280w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">\u6765\u81ea\u53c2\u8003\u6587\u732e1<\/figcaption><\/figure>\n<\/div>\n\n\n<p>\u518d\u603b\u7ed3\u4e00\u4e0b\u4e0a\u8ff0\u6d41\u7a0b\uff0c\u4e3a\u4e86\u6700\u5c0f\u5316$L(y,F(x))$, Gradient Boosting\u5c06$F(x)$\u5b9a\u4e49\u4e3a<\/p>\n\n\n\n<p>$$F(x) = \\sum_{m=0}^{M}f_{m}(x)$$<\/p>\n\n\n\n<p>$M$\u51b3\u5b9a\u4e86\u6709\u591a\u5c11\u4e2a\u57fa\u5b66\u4e60\u5668\u53c2\u4e0e\u4e32\u8054\uff0c\u4ed6\u4eec\u662f\u987a\u5e8f\u5b66\u4e60\u7684\uff0c\u5728\u7b2c$m$\u8f6e\u5b66\u4e60\u7684\u76ee\u6807\u662f\u622a\u6b62\u5230\u4e0a\u4e00\u8f6e$L(y,F(x))$\u5bf9$F(x)$\u7684\u8d1f\u68af\u5ea6<\/p>\n\n\n\n<p>$$-g_m(x) = &#8211; \\left [ \\frac{\\partial L(y,F(x))}{\\partial F(x)} \\right ]_{F(x)=F_{m-1}(x)}$$<\/p>\n\n\n\n<p>\u5f53\u635f\u5931\u51fd\u6570\u91c7\u7528Square Error\u7684\u65f6\u5019\uff0c\u5b66\u4e60\u7684\u76ee\u6807\u5c31\u662f\u6b8b\u5dee\uff0c\u5c31\u662f\u8ddd\u79bb\u6700\u7ec8\u7684$y$\u8fd8\u5269\u591a\u597d\u8981\u53bb\u903c\u8fd1$y-\\hat{y}$\uff0c\u8fd9\u4e2a\u6b8b\u5dee\u662f\u4e00\u4e2a\u5411\u91cf\u3002\u6700\u7ec8\u7684\u90a3\u4e2a$f_m$\u8ddf$-g_m$\u7684\u65b9\u5411\u5927\u6982\u4e00\u81f4\uff0c\u8fd8\u6709\u4e00\u6b65\u5c31\u662f\u7ebf\u6027\u641c\u7d22\uff0c\u671d\u7740\u8d1f\u68af\u5ea6\u65b9\u5411\u884c\u8fdb\u4e00\u5b9a\u7684\u8ddd\u79bb\u4f7f\u5f97\u672c\u8f6e\u8fed\u4ee3\u5c3d\u53ef\u80fd\u903c\u8fd1\u76ee\u6807$y$<\/p>\n\n\n\n<p>$$\\rho_m = \\arg \\min_{\\rho} L(y, F_{m-1}(x) + \\rho h(x;a)$$<\/p>\n\n\n\n<p>\u4e8e\u662f\u672c\u8f6e\u8fed\u4ee3\u6210\u540e\uff0c\u8f93\u51fa\u7684\u9884\u6d4b\u503c\u5c31\u662f<\/p>\n\n\n\n<p>$$F_m(x) = F_{m-1}(x) + \\beta_m f_{m}(x)$$<\/p>\n\n\n\n<p>\u773c\u5c16\u7684\u4f60\u770b\u5230\u6709\u591a\u4e86\u4e00\u4e2a\u53d8\u91cf$\\beta_m$\uff0c\u5b83\u7528\u6765\u63a7\u5236\u6bcf\u4e2a\u57fa\u5b66\u4e60\u5668\u7684\u8d21\u732e\uff0c\u8fd1\u4e00\u6b65\u7ea6\u675f\u6bcf\u4e2a\u5b66\u4e60\u5668\u7684\u80fd\u529b\u6765\u907f\u514d\u8fc7\u62df\u5408\u3002<\/p>\n\n\n\n<p>\u6765\u4e2a\u4f8b\u5b50\u5427\uff0c\u5982\u679c\u635f\u5931\u51fd\u6570\u91c7\u7528Least-absolute-deviation (LAD):<\/p>\n\n\n\n<p>$$L(y,F) = |y &#8211; F|$$<\/p>\n\n\n\n<p>\u7b2cm\u8f6e\u57fa\u5b66\u4e60\u5668\u62df\u5408\u7684\u76ee\u6807\u662f<\/p>\n\n\n\n<p>$$-g_m = \\text{sign}(y &#8211; F_{m-1})$$<\/p>\n\n\n\n<p>\u7ebf\u6027\u641c\u7d22\u90e8\u5206<\/p>\n\n\n\n<p>$$\\rho_m = \\arg \\min_{\\rho} \\sum_{i=1}^{N} |y_i &#8211; F_{m-1}(x_i) &#8211; \\rho h_m(x_i;a_m)|$$ $$= \\arg \\min_{\\rho} \\sum_{i=1}^{N} |h_m(x_i;a_m)|\\cdot|\\frac{y_i &#8211; F_{m-1}(x_i)}{h_m(x_i;a_m)} -\\rho|$$<\/p>\n\n\n\n<p>\u4e8e\u662f\u6211\u4eec\u5f97\u5230\u4e00\u4e2a\u89e3\u6790\u89e3<\/p>\n\n\n\n<p>$$\\rho_m=\\text{middian}_{w} \\left (\\frac{y_i &#8211; F_{m-1}(x_i)}{h_m(x_i;a_m)} \\right )_1^N$$<\/p>\n\n\n\n<p>\u5176\u4e2d$w=|h_m(x_i;a_m)|$<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DT\u90e8\u5206<\/h2>\n\n\n\n<p>DT\u6307\u7684\u662f\u57fa\u5b66\u4e60\u5668\u4f7f\u7528Decision Tree\uff08\u51b3\u7b56\u6811\uff09\uff0c\u5728GBDT\u4e2d\u91c7\u7528\u4e86Regression Tree\uff08\u56de\u5f52\u6811\uff09\uff0c\u5982\u679c\u8fd9\u68f5\u6811\u6709$J$\u4e2a\u53f6\u5b50\u7ed3\u70b9\uff0c\u6570\u5b66\u8868\u8fbe\u5f62\u5f0f\u662f<\/p>\n\n\n\n<p>$$h(x;\\{b_j, R_j\\}_{1}^{J})=\\sum_{j=1}^{J}b_j1(x \\in R_j)$$<\/p>\n\n\n\n<p>$J$\u4e2a\u53f6\u5b50\u7ed3\u70b9\u76f8\u5f53\u4e8e\u628a\u6837\u672c\u7a7a\u95f4\u5212\u5206\u6210\u4e86$J$\u4e2a\u4e0d\u76f8\u4ea4\u7684\u5b50\u7a7a\u95f4\uff0c$R_j$\u8868\u793a\u7b2c$j$\u4e2a\u5212\u5206\uff0c$1(x)$\u5982\u679c$x$\u6210\u7acb\u8fd4\u56de1\u4e0d\u6210\u7acb\u8fd4\u56de0\uff0c\u6240\u4ee5\u4e0a\u8ff0\u516c\u5f0f\u7684\u610f\u601d\u662f\u5982\u679c$x$\u5728$R_j$\u7684\u5212\u5206\u91cc\uff0c\u90a3\u5c31\u8fd4\u56de$b_j$\u3002<\/p>\n\n\n\n<p>\u7b2c$m$\u8f6e\u8fed\u4ee3\u5b8c\u6210\u540e\uff0c\u6a21\u578b\u7684\u9884\u6d4b\u503c\u662f<\/p>\n\n\n\n<p>$$F_m(x) = F_{m-1}(x) + \\rho_m \\sum_{j=1}^{J} b_{jm} 1(x\\in R_{jm})$$<\/p>\n\n\n\n<p>$R_{jm}$\u662f\u7b2c$m$\u8f6e\u4e2d\u5b66\u4e60\u5230\u7684\u6811\u7684\u7b2c$j$\u4e2a\u5212\u5206\uff0c\u56de\u5f52\u6811\u628a\u6837\u672c\u5212\u5206\u6210\u4e86$J$\u4e2a\u4e0d\u76f8\u4ea4\u7684\u7a7a\u95f4\uff0c\u6211\u4eec\u628a\u6bcf\u4e2a\u7a7a\u95f4\u770b\u6210\u4e00\u4e2a\u72ec\u7acb\u7684\u51fd\u6570<\/p>\n\n\n\n<p>$$F_m(x) = F_{m-1}(x) + \\sum_{j=1}^{J} r_{jm}1(x \\in R_{jm})$$<\/p>\n\n\n\n<p>\u5176\u4e2d$\\rho_{m}b_{jm}$\u662f$\\gamma_{jm}$\uff0c\u4e8e\u662f\u6bcf\u4e2a\u7a7a\u95f4\u53ef\u4ee5\u72ec\u7acb\u6267\u884c\u4e0a\u8ff0\u4f18\u5316\u95ee\u9898\uff0c\u7b2c$j$\u4e2a\u7a7a\u95f4\u7684\u8f93\u51fa\u7b49\u4e8e$\\gamma_{jm}$\u662f<\/p>\n\n\n\n<p>$$\\gamma_{jm} = \\arg \\min _{\\gamma} \\sum_{x_i \\in R_{jm}} L(y_i, F_{m-1}(x_i) + \\gamma)$$<\/p>\n\n\n\n<p>\u5982\u679c\u635f\u5931\u51fd\u6570\u662fleast square<\/p>\n\n\n\n<p>$$\\gamma _{jm} = \\arg \\min_{\\gamma} \\sum_{x_i \\in R_{jm}} |y_i &#8211; F_{m-1}(x_i) &#8211; \\gamma|$$<\/p>\n\n\n\n<p>\u6839\u636e\u521d\u4e2d\u5b66\u5230\u7684\u77e5\u8bc6\uff0c$\\gamma_{jm}$\u662f$\\textbf{y} &#8211; F_{m-1}(\\textbf{x})$\u7684\u4e2d\u4f4d\u6570\u7684\u65f6\u5019\u53ef\u4ee5\u5f97\u5230\u6700\u5c0f\u503c\u3002<\/p>\n\n\n\n<p>\u5982\u679c\u635f\u5931\u51fd\u6570\u662fleast square<\/p>\n\n\n\n<p>$$\\gamma _{jm} = \\arg \\min_{\\gamma} \\sum_{x_i \\in R_{jm}} ||y_i &#8211; F_{m-1}(x_i) &#8211; \\gamma||^2$$<\/p>\n\n\n\n<p>\u4f9d\u65e7\u662f\u521d\u4e2d\u77e5\u8bc6\uff0c\u6b64\u65f6$\\gamma_{jm}$\u662f$\\textbf{y} &#8211; F_{m-1}(\\textbf{x})$\u7684\u5e73\u5747\u6570\u7684\u65f6\u5019\u53ef\u4ee5\u5f97\u5230\u6700\u5c0f\u503c\u3002<\/p>\n\n\n\n<p>\u8fd9\u91cc\u6709\u4e2aLeast absolute deviation\u7684\u7b97\u6cd5\u6d41\u7a0b<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"488\" src=\"https:\/\/tensorzen.blog\/wp-content\/uploads\/2024\/04\/image-1-1024x488.png\" alt=\"\" class=\"wp-image-412\" style=\"width:709px;height:auto\" srcset=\"https:\/\/tensorzen.blog\/wp-content\/uploads\/2024\/04\/image-1-1024x488.png 1024w, https:\/\/tensorzen.blog\/wp-content\/uploads\/2024\/04\/image-1-300x143.png 300w, https:\/\/tensorzen.blog\/wp-content\/uploads\/2024\/04\/image-1-768x366.png 768w, https:\/\/tensorzen.blog\/wp-content\/uploads\/2024\/04\/image-1.png 1250w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">\u6765\u81ea\u53c2\u8003\u6587\u732e1<\/figcaption><\/figure>\n<\/div>\n\n\n<p>\u53c2\u8003\u6587\u732e\uff1a<\/p>\n\n\n\n<p><em>1.&nbsp;Friedman J H . Greedy Function Approximation: A Gradient Boosting Machine[J]. Annals of Statistics, 2001, 29(5):1189-1232.<\/em><\/p>\n\n\n\n<p><em>2.&nbsp;Jerome F ,&nbsp; Robert T ,&nbsp; Trevor H . Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)[J]. Ann. Statist.&nbsp; 2000.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>GBDT(Gradient Boosting Decision Tree)\u68af\u5ea6\u63d0\u5347\u51b3\u7b56\u6811\uff0c\u770b\u5230Gradient\u4e0d\u514d\u4f1a\u60f3\u5230\u68af\u5ea6\u4e0b\u964d\uff0c\u6240\u4ee5\u6211\u4eec\u4ece\u68af\u5ea6\u4e0b\u964d\u5f00\u59cb\u804a\u3002<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,4],"tags":[24],"class_list":["post-280","post","type-post","status-publish","format-standard","hentry","category-base","category-machine-learning","tag-machine-learning"],"_links":{"self":[{"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/posts\/280","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=280"}],"version-history":[{"count":92,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/posts\/280\/revisions"}],"predecessor-version":[{"id":712,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=\/wp\/v2\/posts\/280\/revisions\/712"}],"wp:attachment":[{"href":"https:\/\/tensorzen.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=280"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=280"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tensorzen.blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=280"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}