Pandas Pivot Or Groupby For Dynamically Generated Columns
I have a dataframe with sales information in a supermarket. Each row in the dataframe represents an item, with several characteristics as columns. The original DataFrame is somethi
Solution 1:
One possible way to use groupby to make lists of it that can then be turned into columns:
In [24]: res = df.groupby(['ticket_number', 'ticket_price'])['item'].apply(list).apply(pd.Series)
In [25]: res
Out[25]:
012
ticket_number ticket_price
00121 tomato candy soup
00212 soup cola NaN
00356 beef tomato pork
Then, after cleaning up this result a bit:
In [27]: res.columns = ['item' + str(i + 1) for i in res.columns]
In [29]: res.reset_index()
Out[29]:
ticket_number ticket_price item1 item2 item3
000121 tomato candy soup
100212 soup cola NaN
200356 beef tomato pork
Another possible way to create a new column which numbers the items in each group with groupby.cumcount
:
In [38]: df['item_number'] = df.groupby('ticket_number').cumcount()
In [39]: df
Out[39]:
item ticket_number ticket_price item_number
0 tomato 0012101 candy 0012112 soup 0012123 soup 0021204 cola 0021215 beef 0035606 tomato 0035617 pork 003562
And then do some reshaping:
In [40]: df.set_index(['ticket_number', 'ticket_price', 'item_number']).unstack(-1)
Out[40]:
item
item_number 012
ticket_number ticket_price
00121 tomato candy soup
00212 soup cola NaN
00356 beef tomato pork
From here, with some cleaning of the columns names, you can achieve the same as above.
The reshaping step with set_index
and untack
could also be done with pivot_table
: df.pivot_table(columns=['item_number'], index=['ticket_number', 'ticket
_price'], values='item', aggfunc='first')
Post a Comment for "Pandas Pivot Or Groupby For Dynamically Generated Columns"