Skip to content Skip to sidebar Skip to footer

Pivot Duplicates Rows Into New Columns Pandas

I have a data frame like this and I'm trying reshape my data frame using Pivot from Pandas in a way that I can keep some values from the original rows while making the duplicates r

Solution 1:

Use cumcount for count groups, create MultiIndex by set_index with unstack and last flatten values of columns:

g = df.groupby(["ID","Agent", "OV"]).cumcount().add(1)
df = df.set_index(["ID","Agent","OV", g]).unstack(fill_value=0).sort_index(axis=1, level=1)
df.columns = ["{}{}".format(a, b) for a, b in df.columns]

df = df.reset_index()
print (df)
   ID  Agent    OV Zone1  Value1  PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
0   1   10.0  26.0    M1      10   100     0       0     0     0       0     0
1   2   26.5   8.0    M2      50    95    M1       6     5     0       0     0
2   3    4.5   6.0    M3       4    40    M4       6    60     0       0     0
3   4    1.2   0.8    M1       8   100     0       0     0     0       0     0
4   5    2.0   0.4    M1       6    10    M2      41    86    M4       2     4

If want replace to 0 only numeric columns:

g = df.groupby(["ID","Agent"]).cumcount().add(1)
df = df.set_index(["ID","Agent","OV", g]).unstack().sort_index(axis=1, level=1)

idx = pd.IndexSlice
df.loc[:, idx[['Value','PTC']]] = df.loc[:, idx[['Value','PTC']]].fillna(0).astype(int)
df.columns = ["{}{}".format(a, b) for a, b in df.columns]

df = df.fillna('').reset_index()
print (df)
   ID  Agent    OV Zone1  Value1  PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
0   1   10.0  26.0    M1      10   100             0     0             0     0
1   2   26.5   8.0    M2      50    95    M1       6     5             0     0
2   3    4.5   6.0    M3       4    40    M4       6    60             0     0
3   4    1.2   0.8    M1       8   100             0     0             0     0
4   5    2.0   0.4    M1       6    10    M2      41    86    M4       2     4

Solution 2:

You can using cumcount create the help key , then we do unstack with multiple index flatten (PS : you can add fillna(0) at the end , I did not add it cause I do not think for Zone value 0 is correct )

df['New']=df.groupby(['ID','Agent','OV']).cumcount()+1
new_df=df.set_index(['ID','Agent','OV','New']).unstack('New').sort_index(axis=1 , level=1)
new_df.columns=new_df.columns.map('{0[0]}{0[1]}'.format) 
new_df
Out[40]: 
              Zone1  Value1   PTC1 Zone2  Value2  PTC2 Zone3  Value3  PTC3
ID Agent OV                                                               
1  10.0  26.0    M1    10.0  100.0  None     NaN   NaN  None     NaN   NaN
2  26.5  8.0     M2    50.0   95.0    M1     6.0   5.0  None     NaN   NaN
3  4.5   6.0     M3     4.0   40.0    M4     6.0  60.0  None     NaN   NaN
4  1.2   0.8     M1     8.0  100.0  None     NaN   NaN  None     NaN   NaN
5  2.0   0.4     M1     6.0   10.0    M2    41.0  86.0    M4     2.0   4.0

Post a Comment for "Pivot Duplicates Rows Into New Columns Pandas"