Data Science Guy: Feature Scaling

Sunday, February 12, 2017

Feature Scaling

Three Types:

http://stats.stackexchange.com/questions/70553/how-to-verify-a-distribution-is-normalized/70555#70555

Centered: X - mean

Standardized: X - mean / sd
Standardize features by removing the mean and scaling to unit variance

Normalized: X - mean / max - min

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.

https://en.wikipedia.org/wiki/Feature_scaling

Since the range of values of raw data varies widely, in some machine learning algorithms, objective functions will not work properly without normalization. For example, the majority of classifiers calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.

Another reason why feature scaling is applied is that gradient descent converges much faster with feature scaling than without it^{[citation needed]}.

http://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia

http://stats.stackexchange.com/questions/86434/is-standardisation-before-lasso-really-necessary

Lasso regression puts constraints on the size of the coefficients associated to each variable. However, this value will depend on the magnitude of each variable. It is therefore necessary to center and reduce, or standardize, the variables.

The result of centering the variables means that there is no longer an intercept. This applies equally to ridge regression, by the way.

Another good explanation is this post: Need for centering and standardizing data in regression

Data Science Guy

Sunday, February 12, 2017

Feature Scaling

No comments:

Post a Comment