Skip to content Skip to sidebar Skip to footer

Pandas Pivot Or Groupby For Dynamically Generated Columns

I have a dataframe with sales information in a supermarket. Each row in the dataframe represents an item, with several characteristics as columns. The original DataFrame is somethi

Solution 1:

One possible way to use groupby to make lists of it that can then be turned into columns:

In [24]: res = df.groupby(['ticket_number', 'ticket_price'])['item'].apply(list).apply(pd.Series)

In [25]: res
Out[25]:
                                 012
ticket_number ticket_price
00121            tomato   candy  soup
00212              soup    cola   NaN
00356              beef  tomato  pork

Then, after cleaning up this result a bit:

In [27]: res.columns = ['item' + str(i + 1) for i in res.columns]

In [29]: res.reset_index()
Out[29]:
  ticket_number ticket_price   item1   item2 item3
000121  tomato   candy  soup
100212    soup    cola   NaN
200356    beef  tomato  pork

Another possible way to create a new column which numbers the items in each group with groupby.cumcount:

In [38]: df['item_number'] = df.groupby('ticket_number').cumcount()

In [39]: df
Out[39]:
     item ticket_number ticket_price  item_number
0  tomato           0012101   candy           0012112    soup           0012123    soup           0021204    cola           0021215    beef           0035606  tomato           0035617    pork           003562

And then do some reshaping:

In [40]: df.set_index(['ticket_number', 'ticket_price', 'item_number']).unstack(-1)
Out[40]:
                              item
item_number                      012
ticket_number ticket_price
00121            tomato   candy  soup
00212              soup    cola   NaN
00356              beef  tomato  pork

From here, with some cleaning of the columns names, you can achieve the same as above.

The reshaping step with set_index and untack could also be done with pivot_table: df.pivot_table(columns=['item_number'], index=['ticket_number', 'ticket _price'], values='item', aggfunc='first')

Post a Comment for "Pandas Pivot Or Groupby For Dynamically Generated Columns"