https://www.kaggle.com/general/4092
Random Forest is just a bagged version of decision trees except that at each split we only select 'm' randomly chosen attributes.
Random forest achieves a lower test error solely by variance reduction. Therefore increasing the number of trees in the ensemble won't have any effect on the bias of your model. Higher number of trees will only reduce the variance of your model. Moreover you can achieve a higher variance reduction by reducing the correlation between trees in the ensemble. This is the reason why we randomly select 'm' attributes at each split because it will introduce some randomness in to the ensemble and reduce the correlation between trees. Hence 'm' is the major attribute to be tuned in a random forest ensemble.
In general best 'm' is obtained by cross validation. some of the factors affecting 'm' are 1) A small value of m will reduce the variance of the ensemble but will also increase the bias of an individual tree in the ensemble. 2)the value of m also depends on the ratio of noisy variables to important variables in your data set. If you have a lot of noisy variables then small 'm' will decrease the probability of choosing an important variable at a split thus affecting your model.
Hope this helps
========================================================
max_features = non/auto becomes bagged trees
Random forests provide an improvement over bagged trees by way of a small tweak that decorrelates the trees. As in bagging, we build a number of decision trees on boostrapped training samples. But when building these decision trees, each time a split in a tree is considered, a random sample of m predictors is chosen as split candidates from the full set of p predictors.
Empirical good default values are max_features=n_features
for regression problems, and max_features=sqrt(n_features)
for classification tasks
Number of tried attributes the default is square root of the whole number of attributes, yet usually the forest is not very sensitive about the value of this parameter -- in fact it is rarely optimized, especially because stochastic aspect of RF may introduce larger variations.
Increasing max_features generally improves the performance of the model as at each node now we have a higher number of options to be considered. However, this is not necessarily true as this decreases the diversity of individual tree which is the USP of random forest.
f you have built a decision tree before, you can appreciate the importance of minimum sample leaf size. Leaf is the end node of a decision tree. A smaller leaf makes the model more prone to capturing noise in train data. Generally I prefer a minimum leaf size of more than 50. However, you should try multiple leaf sizes to find the most optimum for your use case.
=====================================================
http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#overview
Each tree is grown as follows:
- If the number of cases in the training set is N, sample N cases at random - but with replacement, from the original data. This sample will be the training set for growing the tree.
- If there are M input variables, a number m<<M is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing.
- Each tree is grown to the largest extent possible. There is no pruning.
In the original paper on random forests, it was shown that the forest error rate depends on two things:
- The correlation between any two trees in the forest. Increasing the correlation increases the forest error rate.
- The strength of each individual tree in the forest. A tree with a low error rate is a strong classifier. Increasing the strength of the individual trees decreases the forest error rate.
Reducing m reduces both the correlation and the strength. Increasing it increases both. Somewhere in between is an "optimal" range of m - usually quite wide. Using the oob error rate (see below) a value of m in the range can quickly be found. This is the only adjustable parameter to which random forests is somewhat sensitive.
=====================================================
You are doing it wrong -- the essential part of RF is that it basically only requires making # trees large enough to converge and that's it (it becomes obvious once one starts doing proper tuning, i.e. nested cross-validation to check how robust the selection of parameters really is). If the performance is bad it is better to fix the features or look for an other method.
Pruning trees works nice for decision trees because it removes noise, but doing this within RF kills bagging which relays on it for having uncorrelated members during voting.
Max depth is usually only a technical parameter to avoid recursion overflows while
min sample in leaf is mainly for smoothing votes for regression -- the
spirit of the method is that
Each tree is grown to the largest extent possible.