Update A Null Values In Countrycode Column In A Data Frame By Matching Substring Of Country Name Using Python
I have two data frames: Disaster, CountryInfo Disaster has a column country code which has some null values for example: Disaster: 1.**Country** - **Country_code**
Solution 1:
This should do it. You need to change the column names with rename so that both dataframes have the same column names. Then, the difflib module and its get_close_matches method can be used to do a fuzzy match and replace of Country names. Then it is a simple matter of merging the dataframes
import pandas as pd
import numpy as np
import difflib
df1 = pd.DataFrame({'Country' : ['India', 'Afghanistan', 'India', 'United States of America'],
'Country_code' : ['Null', 'AFD', 'IND', 'Null']})
df1
Country Country_code
0 India Null
1 Afghanistan AFD
2 India IND
3 United States of America Null
df2 = pd.DataFrame({'Country' : ['India', 'Afghanistan', 'India', 'United States'],
'ISO' : ['IND', 'AFD', 'IND', 'USA']})
df2
Country ISO
0 India IND
1 Afghanistan AFD
2 India IND
3 United States USA
df2.rename(columns={'ISO' : 'Country_code'}, inplace=True)
df2
Country Country_code
0 India IND
1 Afghanistan AFD
2 India IND
3 United States USA
The following code will change the Country column in df2 with the names in the Country column in df1 that provide the closest match. This is a way of performing a kind of "fuzzy join" on the substrings.
df1['Country'] = df1.Country.map(lambda x: difflib.get_close_matches(x, df2.Country)[0])
df1
Country Country_code
0 India Null1 Afghanistan AFD
2 India IND
3 United States NullNow you can simply merge the dataframes, which will update missing Country_code rows in df1.
df1.merge(df2, how='right', on=['Country', 'Country_code'])
Country Country_code
0 Afghanistan AFD
1 India IND
2 India IND
3 United States USA
Post a Comment for "Update A Null Values In Countrycode Column In A Data Frame By Matching Substring Of Country Name Using Python"