Data Science Guy: What is Bagging?

Monday, January 30, 2017

What is Bagging?

Bootstrap aggregating, also called bagging

Given a standard training set D of size n, bagging generates m new training sets

D_{i}

, each of size n′, by sampling from D uniformly and with replacement. By sampling with replacement, some observations may be repeated in each

D_{i}

. If n′=n, then for large n the set

D_{i}

is expected to have the fraction (1 - 1/e) (≈63.2%) of the unique examples of D, the rest being duplicates.^[1] This kind of sample is known as a bootstrap sample. The m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification).

Bagging leads to "improvements for unstable procedures" (Breiman, 1996), which include, for example, artificial neural networks, classification and regression trees, and subset selection in linear regression (Breiman, 1994). An interesting application of bagging showing improvement in preimage learning is provided here.^[2]^[3] On the other hand, it can mildly degrade the performance of stable methods such as K-nearest neighbors (Breiman, 1996).

Data Science Guy

Monday, January 30, 2017

What is Bagging?

No comments:

Post a Comment