Fastest Way To Eliminate Specific Dates From Pandas Dataframe
I'm working with a large data frame and I'm struggling to find an efficient way to eliminate specific dates. Note that I'm trying to eliminate any measurements from a specific date
Solution 1:
You can create a boolean mask using a list comprehension.
>>> df[[d.date() not in pd.to_datetime(removelist) for d in df.index]]
values
2016-04-21 15:03:49 28.059520
2016-04-23 08:13:42 -22.376577
2016-04-23 11:23:41 40.350252
2016-04-23 14:08:41 14.557856
2016-04-25 06:48:33 -0.271976
2016-04-25 21:48:31 20.156240
2016-04-26 13:58:28 -3.225795
2016-04-27 01:58:26 51.991293
2016-04-27 02:53:26 -0.867753
2016-04-27 15:33:23 31.585201
2016-04-27 18:08:23 11.639641
2016-04-27 20:48:22 42.968156
2016-04-27 21:18:22 27.335995
2016-04-27 23:13:22 13.120088
2016-04-28 12:08:20 53.730511
Solution 2:
Same idea as @Alexander, but using properties of the DatetimeIndex
and numpy.in1d
:
mask = ~np.in1d(df.index.date, pd.to_datetime(removelist).date)
df = df.loc[mask, :]
Timings:
%timeit df.loc[~np.in1d(df.index.date, pd.to_datetime(removelist).date), :]
1000 loops, best of 3: 1.42 ms per loop
%timeit df[[d.date() not in pd.to_datetime(removelist) for d in df.index]]
100 loops, best of 3: 3.25 ms per loop
Post a Comment for "Fastest Way To Eliminate Specific Dates From Pandas Dataframe"