Mapping Matching Word Count On A Column Using Pandas In Python
I have a df, Name Step Description Ram 1 Ram is oNe of the good cricketer Ram 2 gopal one Sri 1 Sri is one of the member Sri 2 ra
Solution 1:
Create new mask and apply it:
my_list=["one","good"]
mask=df["Description"].str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
(df.groupby('Name').cumcount() == 0)
print (mask)
0True1False2True3False4False5True6True7True8True9True10True11True12True
dtype: bool
extracted = df['Description'].str.findall('('+'|'.join(my_list) +')', flags=re.IGNORECASE)
df.loc[mask, 'keys'] = extracted.str.join(',')
df.loc[mask, 'count'] = extracted.str.len()
print (df)
Name Step Description keys count
0 Ram 1 Ram isoNeof the good cricketer oNe,good 2.01 Ram 2 gopal one NaN NaN
2 Sri 1 Sri isoneof the memberone1.03 Sri 2 ravi good NaN NaN
4 Kumar 1 Kumar is a keeper NaN NaN
5 Madhu 1 good boy good 1.06 Vignesh 1oNe little oNe1.07 Pechi 1one book one1.08 mario 1 good randokm good 1.09 Roger 1one milita good one,good 2.010 bala 1 looks good good 1.011 raj 1 more oneone1.012 venk 1 likes good good 1.0
EDIT:
#transform allvalues if need same size of original
s = df.groupby('Name')['Description'].transform(','.join)
print (s)
0 Ram isoNeof the good cricketer,gopal one1 Ram isoNeof the good cricketer,gopal one2 Sri isoneof the member,ravi good
3 Sri isoneof the member,ravi good
4 Kumar is a keeper
5 good boy
6oNe little
7one book
8 good randokm good
9one milita good
10 looks good
11 more one12 likes good
Name: Description, dtype: object
#for mask use new Series s
mask=s.str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
(df.groupby('Name').cumcount() == 0)
print (mask)
0True1False2True3False4False5True6True7True8True9True10True11True12True
dtype: bool
#extract fromnew Series s
extracted = s.str.findall('('+'|'.join(my_list) +')', flags=re.IGNORECASE).apply(set)
df.loc[mask, 'keys'] = extracted.str.join(',')
df.loc[mask, 'count'] = extracted.str.len()
print (df)
Name Step Description keys count
0 Ram 1 Ram isoNeof the good cricketer good,oNe,one3.01 Ram 2 gopal one NaN NaN
2 Sri 1 Sri isoneof the member good,one2.03 Sri 2 ravi good NaN NaN
4 Kumar 1 Kumar is a keeper NaN NaN
5 Madhu 1 good boy good 1.06 Vignesh 1oNe little oNe1.07 Pechi 1one book one1.08 mario 1 good randokm good good 1.09 Roger 1one milita good good,one2.010 bala 1 looks good good 1.011 raj 1 more oneone1.012 venk 1 likes good good 1.0
Post a Comment for "Mapping Matching Word Count On A Column Using Pandas In Python"