Wednesday, March 30, 2016

RF vs GB

There are two main reasons why you would use Random Forests over Gradient Boosted Decision Trees, and they are both pretty related:

  1. RF are much easier to tune than GBM
  2. RF are harder to overfit than GBM
Related to (1), RF basically has only one hyperparameter to set: the number of features to randomly select at each node. However there is a rule-of-thumb to use the square root of the number of total features which works pretty well in most cases[1]. On the other hand, GBMs have several hyperparameters that include the number of trees, the depth (or number of leaves), and the shrinkage (or learning rate).

And, regarding (2), while it is not true that RF do not overfit (as opposed as many are led to believe by Breiman's strong assertions[2]), it is true that they are more robust to overfitting and require less tuning to avoid it.

In some sense, RF is a tree ensemble that is more "plug'n'play" than GBM. However, it is generally true that a well-tuned GBM can outperform a RF.

Also, as Tianqi Chen mentioned, RF has traditionally been easier to parallelism. However, that is not a good reason anymore given there are efficient ways to do it with GBMs also.
Both are ensemble learning methods and predict (regression or classification) by combining the outputs from individual trees.  They differ in the way the trees are built - order and the way the results are combined. 

Random Forests train each tree independently, using a random sample of the data. This randomness helps to make the model more robust than a single decision tree, and less likely to overfit on the training data. There are typically two parameters in RF - number of trees and no. of features to be selected at each node.

GBTs build trees one at a time, where each new tree helps to correct errors made by previously trained tree. With each tree added, the model becomes even more expressive. There are typically three parameters - number of trees, depth of trees and learning rate, and the each tree built is generally shallow.

GBDT training generally takes longer because of the fact that trees are built sequentially. However benchmark results have shown GBDT are better learners than Random Forests.

An overview of differences and some benchmarks results in terms of error rate and training time are given in link below:

No comments:

Post a Comment