Splitting Time Series Data Into Groups Based On A Changes In State On A Column In A Python Pandas Dataframe
I need to group some data in a pandas dataframe but the standard grouping method does not quite work how I need it to. It must group so that each change in 'loc' and/or each change
Solution 1:
This is not really a job for groupby
because the order of the rows matters. Instead, compare consecutive rows by using shift
.
In [37]: cols = ['name', 'loc']
In [38]: change = (x[cols] != x[cols].shift(-1)).any(1).shift(1).fillna(True)
In [39]: groups= x[change]
In [40]: groups.columns = ['name', 'loc', 'first']
In [41]: groups['last'] = (groups['first'].shift(-1) -1).fillna(len(x))
In [42]: groupsOut[42]:
name loc firstlast0 john abc 133 john xyz 455 john abc 677 matt abc 88
[4rows x 4 columns]
Solution 2:
You can use a function in the groupby
:
x = pd.DataFrame([['john','abc',1],['john','abc',2],['john','abc',3],['john','xyz',4],['john','xyz',5],['john','abc',6],['john','abc',7],['matt','abc',8]])
x.columns = ['name','loc','time']
last_group = None
c =0deff(y):
global c,last_group
g = x.irow(y)['name'],x.irow(y)['loc']
if last_group != g:
c += 1
last_group = g
return c
print x.groupby(f).head()
Post a Comment for "Splitting Time Series Data Into Groups Based On A Changes In State On A Column In A Python Pandas Dataframe"