Skip to content Skip to sidebar Skip to footer

How To Correctly Use Pandas Agg Function When Running Groupby On A Column Of Type Timestamp/datetime/datetime64?

I'm trying to understand why calling count() directly on a group returns the correct answer (in this example, 2 rows in that group), but calling count via a lambda in the agg() fun

Solution 1:

Here x is the original frame from above (not with your groupby). Passing a UDF, e.g. the lambda, calls this on each Series. So this is the result of the function.

In[35]: x.count()
Out[35]: 
time2dtype: int64

Then coercion to the original dtype of the Series happens. So the result is:

In [36]: Timestamp(2)
Out[36]: Timestamp('1970-01-01 00:00:00.000000002')

which is exactly what you are seeing. The point of the coercion to the original dtype is to preserve it if at all possible. Not doing this would be even more magic on the groupby results.

Post a Comment for "How To Correctly Use Pandas Agg Function When Running Groupby On A Column Of Type Timestamp/datetime/datetime64?"