Skip to content Skip to sidebar Skip to footer

Merging Dataframes On An Index Is More Efficient In Pandas

Why is merging dataframes in Pandas on an index more efficient (faster) than on a column? import pandas as pd # Dataframes share the ID column df = pd.DataFrame({'ID': [0, 1, 2,

Solution 1:

The reason for this is that the DataFrame's index is backed by a hash table.

To merge two sets, we need to find for each element of the first the corresponding in the second (if it exists) Searching is significantly faster if supported by a hash table because searching in an unsorted list is O(N), while in a list supported by a hash function ~O(1).

One strategy that could be faster to merge columns would be to first create a hash table for the smallest of the two. Still that means that the merge will be slower by the time it takes to create this dict.

Post a Comment for "Merging Dataframes On An Index Is More Efficient In Pandas"