Skip to content Skip to sidebar Skip to footer

Drop Unordered Duplicates Across Separate Columns

I am trying to return a df where duplicate values have been removed. I have tried to use drop.duplicates() but the values in the columns which have been subset aren't ordered. As i

Solution 1:

You'll need to sort the columns along the horizontal axis, then get a mask to subset the original frame. Here's how you can use np.sort and df.duplicated to do that:

df[~pd.DataFrame(np.sort(df2[['Item_X', 'Item_Y']], axis=1)).duplicated()]

  Item_X Item_Y  Value
0    Foo    Bar      12    Bot    Foo      33    Bot    Bot      44    Bar    Bar      55    Foo    Foo      6

Solution 2:

IIUC, use:

m=pd.DataFrame(np.sort(df[['Item_X','Item_Y']])).duplicated()
df[~m]

  Item_X Item_Y  Value
0    Foo    Bar      1
2    Bot    Foo      3
3    Bot    Bot      4
4    Bar    Bar      5
5    Foo    Foo      6

Post a Comment for "Drop Unordered Duplicates Across Separate Columns"