Skip to content Skip to sidebar Skip to footer

Splitting Time Series Data Into Groups Based On A Changes In State On A Column In A Python Pandas Dataframe

I need to group some data in a pandas dataframe but the standard grouping method does not quite work how I need it to. It must group so that each change in 'loc' and/or each change

Solution 1:

This is not really a job for groupby because the order of the rows matters. Instead, compare consecutive rows by using shift.

In [37]: cols = ['name', 'loc']

In [38]: change = (x[cols] != x[cols].shift(-1)).any(1).shift(1).fillna(True)

In [39]: groups= x[change]

In [40]: groups.columns = ['name', 'loc', 'first']

In [41]: groups['last'] = (groups['first'].shift(-1) -1).fillna(len(x))

In [42]: groupsOut[42]:
   name  loc  firstlast0  john  abc      133  john  xyz      455  john  abc      677  matt  abc      88

[4rows x 4 columns]

Solution 2:

You can use a function in the groupby:

x = pd.DataFrame([['john','abc',1],['john','abc',2],['john','abc',3],['john','xyz',4],['john','xyz',5],['john','abc',6],['john','abc',7],['matt','abc',8]])
x.columns = ['name','loc','time']

last_group = None
c =0deff(y):
    global c,last_group
    g = x.irow(y)['name'],x.irow(y)['loc']
    if last_group != g:
        c += 1
        last_group = g
    return c

print x.groupby(f).head()

Post a Comment for "Splitting Time Series Data Into Groups Based On A Changes In State On A Column In A Python Pandas Dataframe"