Three Types:
http://stats.stackexchange.com/questions/70553/how-to-verify-a-distribution-is-normalized/70555#70555
Centered: X - mean
Standardized: X - mean / sd
Standardize features by removing the mean and scaling to unit variance
Normalized: X - mean / max - min
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
https://en.wikipedia.org/wiki/Feature_scaling
http://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia
http://stats.stackexchange.com/questions/86434/is-standardisation-before-lasso-really-necessary
http://stats.stackexchange.com/questions/70553/how-to-verify-a-distribution-is-normalized/70555#70555
Centered: X - mean
Standardized: X - mean / sd
Standardize features by removing the mean and scaling to unit variance
Normalized: X - mean / max - min
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
https://en.wikipedia.org/wiki/Feature_scaling
Since the range of values of raw data varies widely, in some machine learning algorithms, objective functions will not work properly without normalization. For example, the majority of classifiers calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.
Another reason why feature scaling is applied is that gradient descent converges much faster with feature scaling than without it[citation needed].
http://stats.stackexchange.com/questions/86434/is-standardisation-before-lasso-really-necessary
Lasso regression puts constraints on the size of the coefficients associated to each variable. However, this value will depend on the magnitude of each variable. It is therefore necessary to center and reduce, or standardize, the variables.
The result of centering the variables means that there is no longer an intercept. This applies equally to ridge regression, by the way.
Another good explanation is this post: Need for centering and standardizing data in regression
No comments:
Post a Comment