Data Science Guy: Understanding RandomForest Parameters

Tuesday, August 2, 2016

Understanding RandomForest Parameters

max_features = non/auto becomes bagged trees

Random forests provide an improvement over bagged trees by way of a small tweak that decorrelates the trees. As in bagging, we build a number of decision trees on boostrapped training samples. But when building these decision trees, each time a split in a tree is considered, a random sample of m predictors is chosen as split candidates from the full set of p predictors.

Empirical good default values are max_features=n_features for regression problems, and max_features=sqrt(n_features) for classification tasks

Number of tried attributes the default is square root of the whole number of attributes, yet usually the forest is not very sensitive about the value of this parameter -- in fact it is rarely optimized, especially because stochastic aspect of RF may introduce larger variations.

Data Science Guy

Tuesday, August 2, 2016

Understanding RandomForest Parameters

No comments:

Post a Comment