How To Keep Only The Most Recent Revised Order For Each Order In Pandas
Say I have a data frame that tracks the order number, and the revision number for that order in two different columns like so: OrderNum RevNum TotalPrice 0AXL3 0 $5.00
Solution 1:
IIUC:
In [100]: df.groupby('OrderNum', as_index=False).last()
Out[100]:
OrderNum RevNum TotalPrice
0 0AXL3 3 $8.00
1 0BDF1 2 $8.50
UPDATE:
If there were other columns in the data frame, would this keep those as well?
In [116]: df['new'] = np.arange(len(df))
In [117]: df
Out[117]:
OrderNum RevNum TotalPrice new
0 0AXL3 0 $5.00 0
1 0AXL3 1 $4.00 1
2 0AXL3 2 $7.00 2
3 0AXL3 3 $8.00 3
4 0BDF1 0 $3.00 4
5 0BDF1 1 $2.50 5
6 0BDF1 2 $8.50 6
In [118]: df.groupby('OrderNum', as_index=False).last()
Out[118]:
OrderNum RevNum TotalPrice new
0 0AXL3 3 $8.00 3
1 0BDF1 2 $8.50 6
Solution 2:
One way is use drop_duplicates, note dataframe should be sorted on RevNum from smallest to largest or you can add sort_values:
df1.drop_duplicates(subset='OrderNum', keep='last')
Output:
OrderNum RevNum TotalPrice
3 0AXL3 3 $8.00
6 0BDF1 2 $8.50
OR
df1[~df1.duplicated(subset='OrderNum', keep='last')]
Output:
OrderNum RevNum TotalPrice
3 0AXL3 3 $8.00
6 0BDF1 2 $8.50
Post a Comment for "How To Keep Only The Most Recent Revised Order For Each Order In Pandas"