Sunday, July 31, 2016

Feature Engineering

You can take different combinations of features such as sum of features: feat_1 + feat_2 + feat_3..., or product of those. Or you can transform features by log, or exponential, sigmoid ... or even discretize the numeric feature into a categorical one. It's an infinite space to explore.
Whatever combination or transformation that increases your Cross-Validation or Test Set performance then you should use it.

How to Tune Gradient Boosting in Python


Saturday, July 30, 2016

Thursday, June 9, 2016

Saturday, June 4, 2016

use a list of values to select rows from a pandas dataframe

In [5]: df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]})

In [6]: df
Out[6]:
   A  B
0  5  1
1  6  2
2  3  3
3  4  5

In [7]: df[df['A'].isin([3, 6])]
Out[7]:
   A  B
1  6  2
2  3  3

http://www.unknownerror.org/opensource/pydata/pandas/q/stackoverflow/12096252/use-a-list-of-values-to-select-rows-from-a-pandas-dataframe

Difference Between Groupby and Pivot_table for Pandas

Both pivot_table and groupby are used to aggregate your dataframe.

If you want to get SQL style of aggregation, groupby is the way to go.





Both pivot_table and groupby are used to aggregate your dataframe. The difference is only with regard to the shape of the result.