Skip to content Skip to sidebar Skip to footer

Pandas - Sum Previous Rows If Value In Column Meets Condition

I have a dataframe that is of the following type. I have all the columns except the final column, 'Total Previous Points P1', which I am hoping to create: The data is sorted by the

Solution 1:

The solution by SIA computes sum of Points_P1including the current value of Points_P1, whereas the requirement is to sum previous points (for all rows before...).

Assuming that dates in each group are unique (in your sample they are), the proper, pandasonic solution should include the following steps:

  • Sort by Date.
  • Group by P1_id, then for each group:
  • Take Points_P1 column.
  • Compute cumulative sum.
  • Subtract the current value of Points_P1.

So the whole code should be:

df['Total_Previous_Points_P1'] = df.sort_values('Date')\
    .groupby(['P1_id']).Points_P1.cumsum() - df.Points_P1

Edit

If Date is not unique (within group of rows with some P1_id), the case is more complicated, what can be shown on such source DataFrame:

DatePoints_P1P1_id02016-11-09          510012016-11-09          310022015-10-08          510032019-09-20         101000042019-09-21          710052019-07-10         121000062019-12-10         1210000

Note that for P1_id there are two rows for 2016-11-09.

In this case, start from computing "group" sums of previous points, for each P1_id and Date:

sumPrev = df.groupby(['P1_id', 'Date']).Points_P1.sum()\
    .groupby(level=0).apply(lambda gr: gr.shift(fill_value=0).cumsum())\
    .rename('Total_Previous_Points_P1')

The result is:

P1_idDate1002015-10-08     02016-11-09     52019-09-21    13100002019-07-10     02019-09-20    122019-12-10    22Name:Total_Previous_Points_P1,dtype:int64

Then merge df with sumPrev on P1_id and Date (in sumPrev on the index):

df = pd.merge(df, sumPrev, left_on=['P1_id', 'Date'], right_index=True)

To show the result, it is more instructive to sort df also on ['P1_id', 'Date']:

DatePoints_P1P1_idTotal_Previous_Points_P122015-10-08          5100002016-11-09          5100512016-11-09          3100542019-09-21          71001352019-07-10         1210000032019-09-20         10100001262019-12-10         121000022

As you can see:

  • The first sum for each P1_id is 0 (no points from previous dates).
  • E.g. for both rows with Date == 2016-11-09 the sum of previous points is 5 (which is in row for Date == 2015-10-08).

Solution 2:

Try:

df['Total_Previous_Points_P1'] = df.groupby(['P1_id'])['Points_P1'].cumsum()

How It Works

First, it groups the data using P1_id feature.

Then it accesses the Points_P1 values on the grouped dataframe and apply the cumulative sum function cumsum(), which returns the sum of points up to and including the current row for each group.

Post a Comment for "Pandas - Sum Previous Rows If Value In Column Meets Condition"