Thursday, June 9, 2016

Dataframe vs. Nested List vs. Dictionary for Storing info in Python

DataFrame

DecisionTree
Param
Score

Saturday, June 4, 2016

use a list of values to select rows from a pandas dataframe

In [5]: df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]})

In [6]: df
Out[6]:
   A  B
0  5  1
1  6  2
2  3  3
3  4  5

In [7]: df[df['A'].isin([3, 6])]
Out[7]:
   A  B
1  6  2
2  3  3

http://www.unknownerror.org/opensource/pydata/pandas/q/stackoverflow/12096252/use-a-list-of-values-to-select-rows-from-a-pandas-dataframe

Difference Between Groupby and Pivot_table for Pandas

Both pivot_table and groupby are used to aggregate your dataframe.

If you want to get SQL style of aggregation, groupby is the way to go.





Both pivot_table and groupby are used to aggregate your dataframe. The difference is only with regard to the shape of the result.

Friday, June 3, 2016

An Example of Converting SQL Aggregate Function into Python

select mykey, sum(Field1) as Field1, avg(Field1) as avg_field1, min(field2) as min_field2
from df
group by mykey

f = {'Field1':'sum',
         'Field2':['max','mean'],
         'Field3':['min','mean','count'],
         'Field4':'count'
 }

grouped = df.groupby('mykey').agg(f)

Thursday, June 2, 2016

Passing Query Parameters in Pandas


import pyodbc
conn = pyodbc.connect(dsn="hive", autocommit=True)

beg_dt = '2016-05-01'
end_dt = '2016-06-01'

mq = """
select
  local_date,
  hotel_id        as exp_id,
  sum(xclick)     as xclick,
  sum(pclick)     as pclick,
  sum(pcost_usd)  as pcost,
  sum(trx)        as trx,
  sum(gp_usd)     as gp,
  sum(bid_gp_usd) as bid_gp
                     
 from embid.bid_unit_kpi_agg
where local_date between ? and ?
  and etl_processed_type = 'HOTEL'
  and partner_org in ('TRIPADVISOR')  
  and partner_pos = 'US'
  and brand = 'ORBITZ'
  and device_type = 'MOBILE'
                     
group by local_date, hotel_id
order by local_date, hotel_id
"""
ta_orb_mob_perf = pd.read_sql(mq, conn, params=[beg_dt, end_dt])

Access a Column of Pandas Dataframe


The Python and NumPy indexing operators [] and attribute operator . provide quick and easy access to pandas data structures across a wide range of use cases.