Skip to content Skip to sidebar Skip to footer

Mapping Matching Word Count On A Column Using Pandas In Python

I have a df, Name Step Description Ram 1 Ram is oNe of the good cricketer Ram 2 gopal one Sri 1 Sri is one of the member Sri 2 ra

Solution 1:

Create new mask and apply it:

my_list=["one","good"]

mask=df["Description"].str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
     (df.groupby('Name').cumcount() == 0)
print (mask)
0True1False2True3False4False5True6True7True8True9True10True11True12True
dtype: bool

extracted = df['Description'].str.findall('('+'|'.join(my_list) +')', flags=re.IGNORECASE)
df.loc[mask, 'keys'] = extracted.str.join(',')
df.loc[mask, 'count'] = extracted.str.len()
print (df)
       Name  Step                       Description      keys  count
0       Ram     1  Ram isoNeof the good cricketer  oNe,good    2.01       Ram     2                         gopal one       NaN    NaN
2       Sri     1          Sri isoneof the memberone1.03       Sri     2                        ravi good        NaN    NaN
4     Kumar     1                 Kumar is a keeper       NaN    NaN
5     Madhu     1                          good boy      good    1.06   Vignesh     1oNe little       oNe1.07     Pechi     1one book       one1.08     mario     1                      good randokm      good    1.09     Roger     1one milita good  one,good    2.010     bala     1                        looks good      good    1.011      raj     1                          more oneone1.012     venk     1                        likes good      good    1.0

EDIT:

#transform allvalues if need same size of original
s = df.groupby('Name')['Description'].transform(','.join)
print (s)
0     Ram isoNeof the good cricketer,gopal one1     Ram isoNeof the good cricketer,gopal one2            Sri isoneof the member,ravi good 
3            Sri isoneof the member,ravi good 
4                              Kumar is a keeper
5                                       good boy
6oNe little
7one book
8                              good randokm good
9one milita good
10                                    looks good
11                                      more one12                                    likes good
Name: Description, dtype: object

#for mask use new Series s
mask=s.str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
     (df.groupby('Name').cumcount() == 0)
print (mask)
0True1False2True3False4False5True6True7True8True9True10True11True12True
dtype: bool

#extract fromnew Series s
extracted = s.str.findall('('+'|'.join(my_list) +')', flags=re.IGNORECASE).apply(set)
df.loc[mask, 'keys'] = extracted.str.join(',')
df.loc[mask, 'count'] = extracted.str.len()
print (df)
       Name  Step                       Description          keys  count
0       Ram     1  Ram isoNeof the good cricketer  good,oNe,one3.01       Ram     2                         gopal one           NaN    NaN
2       Sri     1          Sri isoneof the member      good,one2.03       Sri     2                        ravi good            NaN    NaN
4     Kumar     1                 Kumar is a keeper           NaN    NaN
5     Madhu     1                          good boy          good    1.06   Vignesh     1oNe little           oNe1.07     Pechi     1one book           one1.08     mario     1                 good randokm good          good    1.09     Roger     1one milita good      good,one2.010     bala     1                        looks good          good    1.011      raj     1                          more oneone1.012     venk     1                        likes good          good    1.0

Post a Comment for "Mapping Matching Word Count On A Column Using Pandas In Python"