Python Regex: Replacing st, nd, th Etc In A Adress With A Single Sub
I have many adresses like 'East 19th Street' or 'West 141st Street' and I would like to remove the 'th' and the 'st' in a single call to re.sub. re.sub('(\d+)st|(\d+)nd|(\d+)rd|(\d
Solution 1:
Let's try this:
re.sub(r"(\d+)(st|nd|rd|th)\b", r"\1", str)
or better
re.sub(r"(?<=\d)(st|nd|rd|th)\b", '', str)
\b
prevents things like 21strange
from being replaced.
To replace only grammatically correct constructs, you can also try:
re.sub(r"(?<=1\d)th\b|(?<=1)st\b|(?<=2)nd\b|(?<=3)rd\b|(?<=[04-9])th\b", r'', str)
This replaces 23rd
and 44th
but leaves invalid things like 23st
intact. Don't know if this is worth the trouble though.
Post a Comment for "Python Regex: Replacingst, nd, th Etc In A Adress With A Single Sub"