Saturday, January 14, 2017

What are the differences between bagged trees and random forests?

The fundamental difference is that in Random forests, only a subset of features are selected at random out of the total and the best split feature from the subset is used to split each node in a tree, unlike in bagging where all features are considered for splitting a node.

agging has a single parameter, which is the number of trees. All trees are fully grown binary tree (unpruned) and at each node in the tree one searches over all features to find the feature that best splits the data at that node.

Random forests has 2 parameters:

  1. The first parameter is the same as bagging (the number of trees)
  2. The second parameter (unique to randomforests) is mtry which is how many features to search over to find the best feature. this parameter is usually 1/3*D for regression and sqrt(D) for classification. thus during tree creation randomly mtry number of features are chosen from all available features and the best feature that splits the data is chosen.

No comments:

Post a Comment