Pandas - Sum Previous Rows If Value In Column Meets Condition
Solution 1:
The solution by SIA computes sum of Points_P1including the current value of Points_P1, whereas the requirement is to sum previous points (for all rows before...).
Assuming that dates in each group are unique (in your sample they are), the proper, pandasonic solution should include the following steps:
- Sort by Date.
- Group by P1_id, then for each group:
- Take Points_P1 column.
- Compute cumulative sum.
- Subtract the current value of Points_P1.
So the whole code should be:
df['Total_Previous_Points_P1'] = df.sort_values('Date')\
.groupby(['P1_id']).Points_P1.cumsum() - df.Points_P1
Edit
If Date is not unique (within group of rows with some P1_id), the case is more complicated, what can be shown on such source DataFrame:
DatePoints_P1P1_id02016-11-09 510012016-11-09 310022015-10-08 510032019-09-20 101000042019-09-21 710052019-07-10 121000062019-12-10 1210000
Note that for P1_id there are two rows for 2016-11-09.
In this case, start from computing "group" sums of previous points, for each P1_id and Date:
sumPrev = df.groupby(['P1_id', 'Date']).Points_P1.sum()\
.groupby(level=0).apply(lambda gr: gr.shift(fill_value=0).cumsum())\
.rename('Total_Previous_Points_P1')
The result is:
P1_idDate1002015-10-08 02016-11-09 52019-09-21 13100002019-07-10 02019-09-20 122019-12-10 22Name:Total_Previous_Points_P1,dtype:int64
Then merge df with sumPrev on P1_id and Date (in sumPrev on the index):
df = pd.merge(df, sumPrev, left_on=['P1_id', 'Date'], right_index=True)
To show the result, it is more instructive to sort df also on ['P1_id', 'Date']:
DatePoints_P1P1_idTotal_Previous_Points_P122015-10-08 5100002016-11-09 5100512016-11-09 3100542019-09-21 71001352019-07-10 1210000032019-09-20 10100001262019-12-10 121000022
As you can see:
- The first sum for each P1_id is 0 (no points from previous dates).
- E.g. for both rows with Date == 2016-11-09 the sum of previous points is 5 (which is in row for Date == 2015-10-08).
Solution 2:
Try:
df['Total_Previous_Points_P1'] = df.groupby(['P1_id'])['Points_P1'].cumsum()
How It Works
First, it groups the data using P1_id
feature.
Then it accesses the Points_P1
values on the grouped dataframe and apply the cumulative sum function cumsum()
, which returns the sum of points up to and including the current row for each group.
Post a Comment for "Pandas - Sum Previous Rows If Value In Column Meets Condition"