Skip to content Skip to sidebar Skip to footer

Groupby Issues Of Not Recognizing Numeric Column Pandas Python

I have an excel data that i read in by pd.read_excel: Block Concentration Name Replicate 1 Array Marker 1 Array

Solution 1:

Instead of function cumcount()+1 can be used rolling count with moving window=3:

#groupby andset rolling count fromcolumn Block
data["Replicate"] = data.groupby(["Block", "Name"])["Block"].transform(pd.rolling_count, window=3) 

Formatting is very strange. If it isn't problem with copy data to question, you can repair it by casting column Concentration to float and striping white-spaces in column Name from start and end of text.

BlockConcentrationNameReplicate1ArrayMarker1ArrayMarker1100.0Man5GlcNAc2  
  133.0Man5GlcNAc2  
  110.0Man5GlcNAc2  
  1100.0Man6GlcNAc2  
  133.0Man6GlcNAc2  
  110.0Man6GlcNAc2  
  1100.0Man7GlcNAc2 D1133.0Man7GlcNAc2 D1110.0Man7GlcNAc2 D11100.0Man7GlcNAc2 D3133.0Man7GlcNAc2 D3110.0Man7GlcNAc2 D3
#convertcolumn Concentration tofloat
data['Concentration'] = data['Concentration'].astype(float)
#strip firstandlast whitespaces
data['Name'] = data['Name'].str.strip()

#groupby andset rolling count fromcolumn Block
data["Replicate"] = data.groupby(["Block", "Name"])["Block"].transform(pd.rolling_count, window=3) 
BlockConcentrationNameReplicate01ArrayMarker111ArrayMarker221100Man5GlcNAc2          13133Man5GlcNAc2          24110Man5GlcNAc2          351100Man6GlcNAc2          16133Man6GlcNAc2          27110Man6GlcNAc2          381100Man7GlcNAc2 D119133Man7GlcNAc2 D1210110Man7GlcNAc2 D13111100Man7GlcNAc2 D3112133Man7GlcNAc2 D3213110Man7GlcNAc2 D33

Solution 2:

If you remove 'Concentration' from your group you will get the expected output.

data["Replicate"] = data.groupby(["Block", "Name"]).cumcount()+1
>>> data

    Block Concentration             Name  Replicate
01''     Array.Marker          111''     Array.Marker          221100.0      Man5GlcNAc2          13133.0      Man5GlcNAc2          24110.0      Man5GlcNAc2          351100.0      Man6GlcNAc2          16133.0      Man6GlcNAc2          27110.0      Man6GlcNAc2          381100.0    Man7GlcNAc2D1          19133.0    Man7GlcNAc2D1          2

Post a Comment for "Groupby Issues Of Not Recognizing Numeric Column Pandas Python"