Skip to content Skip to sidebar Skip to footer

Pandas Merge On `datetime` Or `datetime` In `datetimeindex`

Currently I have two data frames representing excel spreadsheets. I wish to join the data where the dates are equal. This is a one to many join as one spread sheet has a date then

Solution 1:

So here's the option with merging:

Assume you have two DataFrames:

import pandas as pddf1= pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'], 
                    'data': ['A', 'B', 'C']})
df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03'], 
                    'data': ['E', 'F', 'G']})

Now do some cleaning to get all of the dates you need and make sure they are datetime

df1['date'] = pd.to_datetime(df1.date)

df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2['start'] = pd.to_datetime(df2.start)
df2['end'] = pd.to_datetime(df2.end)
# No need for this anymore
df2 = df2.drop(columns='date')

Now merge it all together. You'll get 99x10K rows.

df = df1.assign(dummy=1).merge(df2.assign(dummy=1), on='dummy').drop(columns='dummy')

And subset to the dates that fall in between the ranges:

df[(df.date >= df.start) & (df.date <= df.end)]
#        datedata_xdata_ystartend#02015-01-01AE2015-01-012015-01-02#12015-01-01AF2015-01-012015-01-02#32015-01-02BE2015-01-012015-01-02#42015-01-02BF2015-01-012015-01-02#52015-01-02BG2015-01-022015-01-03#82015-01-03CG2015-01-022015-01-03

If for instance, some dates in df2 were a single date, since we're using .str.split we will get None for the second date. Then just use .loc to set it appropriately.

df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03',
                             '2015-01-03'], 
                    'data': ['E', 'F', 'G', 'H']})

df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2.loc[df2.end.isnull(), 'end'] = df2.loc[df2.end.isnull(), 'start']
#  data      start        end#0    E 2015-01-01 2015-01-02#1    F 2015-01-01 2015-01-02#2    G 2015-01-02 2015-01-03#3    H 2015-01-03 2015-01-03

Now the rest follows unchanged

Solution 2:

Let's use this numpy method by @piRSquared:

df1 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'], 
                    'data': ['A', 'B', 'C']})
df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03'], 
                    'data': ['E', 'F', 'G']})

df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2['start'] = pd.to_datetime(df2.start)
df2['end'] = pd.to_datetime(df2.end)
df1['date'] = pd.to_datetime(df1['date'])

a = df1['date'].values
bh = df2['end'].values
bl = df2['start'].values

i, j = np.where((a[:, None] >= bl) & (a[:, None] <= bh))

pd.DataFrame(np.column_stack([df1.values[i], df2.values[j]]),
             columns=df1.columns.append(df2.columns))

Output:

datedatadatedatastartend02015-01-01 00:00:00    A2015-01-01 to2015-01-02    E2015-01-01 00:00:00  2015-01-02 00:00:0012015-01-01 00:00:00    A2015-01-01 to2015-01-02    F2015-01-01 00:00:00  2015-01-02 00:00:0022015-01-02 00:00:00    B2015-01-01 to2015-01-02    E2015-01-01 00:00:00  2015-01-02 00:00:0032015-01-02 00:00:00    B2015-01-01 to2015-01-02    F2015-01-01 00:00:00  2015-01-02 00:00:0042015-01-02 00:00:00    B2015-01-02 to2015-01-03    G2015-01-02 00:00:00  2015-01-03 00:00:0052015-01-03 00:00:00    C2015-01-02 to2015-01-03    G2015-01-02 00:00:00  2015-01-03 00:00:00

Post a Comment for "Pandas Merge On `datetime` Or `datetime` In `datetimeindex`"