Thursday, May 26, 2016

Factor Level Limit for R

Random Forest implementation in R has a hard limit of 32-levels for a categorical variable. If you want to use randomForest in R, then you need to think about how to reduce the number of levels in categorical variables with more than 32-levels. For ex: You could create dummy variables out of such categorical variables and/or get rid of infrequently occuring levels.
Alternatively, you could switch to scikit-learn in Python which (i think?) does not have such a limit.

No comments:

Post a Comment