Pandas Better Way For Sorting, Grouping, Summing
New to Pandas so wondering if there is a more Pandithic (coining it!) way to sort some data, group it, and then sum part of it. The problem is to find the 3 largest values in a se
Solution 1:
I think you can use head
with sum
first with groupby
and then nlargest
:
df = census_cp.groupby('STNAME')
.apply(lambda x: x.head(3).sum(numeric_only=True))
.reset_index()
.nlargest(3, 'CENSUS2010POP')
Sample:
census_cp = pd.DataFrame({'STNAME':list('abscscbcdbcsscae'),
'CENSUS2010POP':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})
print (census_cp)
CENSUS2010POP STNAME
0 4 a
1 5 b
2 6 s
3 5 c
4 6 s
5 2 c
6 3 b
7 4 c
8 5 d
9 6 b
10 4 c
11 5 s
12 4 s
13 3 c
14 6 a
15 5 e
df = census_cp.groupby('STNAME') \
.apply(lambda x: x.head(3).sum(numeric_only=True)) \
.reset_index() \
.nlargest(3, 'CENSUS2010POP')
print (df)
STNAME CENSUS2010POP
5 s 17
1 b 14
2 c 11
If need double top 3
nlargest
per groups and then nlargest
of summed values use:
df1 = census_cp.groupby('STNAME')['CENSUS2010POP']
.apply(lambda x: x.nlargest(3).sum())
.nlargest(3)
.reset_index()
print (df1)
STNAME CENSUS2010POP
0 s 171 b 142 c 13
Or:
df1 = census_cp.groupby('STNAME')['CENSUS2010POP'].nlargest(3)
.groupby(level=0)
.sum()
.nlargest(3)
.reset_index()
print (df1)
STNAME CENSUS2010POP
0 s 17
1 b 14
2 c 13
Post a Comment for "Pandas Better Way For Sorting, Grouping, Summing"