Remove Level And All Of Its Rows From Pandas Dataframe If One Row Meets Condition
Solution 1:
You could use groupby/filter
to remove groups based on a condition:
import numpy as np
import pandas as pd
index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]], names=['yr', 'visit'])
columns = pd.MultiIndex.from_product([['hr', 'temp']], names=['metric'])
data = pd.DataFrame([[96, 38], [98, 38], [85, 36], [84, 43]], index=index, columns=columns)
print(data.groupby(level='yr').filter(lambda x: (x['temp']>=37).all()))
yields
metric hr temp
yr visit
2013 1 96 38
2 98 38
Since the rows you wish to remove are grouped by yr
and the yr
is a level of the index, use groupby(level='yr')
. For each group the lambda
function is called with x
set to the sub-DataFrame group. The group is kept when
(x['temp']>=37).all())
is True
.
Note that Wen's suggestion,
data.loc[(data['temp']>=37).groupby(level='yr').transform('all')]
is faster, particularly for large DataFrames, since data['temp']>=37
computes the criterion in a vectorized way for the entire column whereas in my solution above, (x['temp']>=37).all()
computes the criterion in a piecemeal fashion for each sub-DataFrame separately. Generally, vectorized solutions are faster when applied to large arrays or NDFrames instead of in a loop on smaller pieces.
Here is an example showing the difference in speed for a 1000-row DataFrame:
In [70]: df = pd.DataFrame(np.random.randint(100, size=(1000, 4)), columns=list('ABCD')).set_index(['A','B'])
In [71]: %timeit df.groupby(level='A').filter(lambda x: (x['C']>=5).all())
10 loops, best of 3: 46.3 ms per loop
In [72]: %timeit df.loc[(df['C']>=37).groupby(level='A').transform('all')]
100 loops, best of 3: 18.9 ms per loop
Solution 2:
Using .loc
:
import pandas as pd
index = pd.MultiIndex.from_product(
[[2013, 2014], [1, 2]], names=['yr', 'visit'])
columns = pd.MultiIndex.from_product([['hr', 'temp']], names=['metric'])
data = pd.DataFrame([[96, 38], [98, 38], [85, 36], [84, 43]],
index=index, columns=columns)
data.loc[[2013]]
Gives:
metric hr temp
yr visit
2013 1 96 38
2 98 38
Post a Comment for "Remove Level And All Of Its Rows From Pandas Dataframe If One Row Meets Condition"