Pandas Groupby Count Non-null Values As Percentage
Given this dataset, I would like to count missing, NaN, values: df = pd.DataFrame({'A' : [1, np.nan, 2 , 55, 6, np.nan, -17, np.nan], 'Team' : ['one', 'one', 'tw
Solution 1:
You can take the mean
of the notnull
Boolean DataFrame:
In [11]: df.notnull()
Out[11]:
A C D Team
0TrueTrueFalseTrue1FalseTrueFalseTrue2TrueTrueTrueTrue3TrueTrueTrueTrue4TrueTrueTrueTrue5FalseTrueTrueTrue6TrueFalseFalseTrue7FalseTrueFalseTrueIn [12]: df.notnull().mean()
Out[12]:
A 0.625
C 0.875
D 0.500
Team 1.000
dtype: float64
and with the groupby:
In [13]: df.groupby("Team").apply(lambda x: x.notnull().mean())
Out[13]:
A C D Team
Team
one 0.6666670.6666670.01.0
three 0.5000001.0000000.51.0
two 0.6666671.0000001.01.0
It may be faster to do this without an apply using set_index
first:
In [14]: df.set_index("Team").notnull().groupby(level=0).mean()
Out[14]:
A C D
Team
one 0.6666670.6666670.0
three 0.5000001.0000000.5
two 0.6666671.0000001.0
Solution 2:
Base on your own code add div(df.groupby('Team').size(),0)
df.groupby('Team').count().div(df.groupby('Team').size(),0)
Out[190]:
A C D
Team
one 0.6666670.6666670.0
three 0.5000001.0000000.5
two 0.6666671.0000001.0
Post a Comment for "Pandas Groupby Count Non-null Values As Percentage"