Skip to content Skip to sidebar Skip to footer

Python Regex: Replacing st, nd, th Etc In A Adress With A Single Sub

I have many adresses like 'East 19th Street' or 'West 141st Street' and I would like to remove the 'th' and the 'st' in a single call to re.sub. re.sub('(\d+)st|(\d+)nd|(\d+)rd|(\d

Solution 1:

Let's try this:

re.sub(r"(\d+)(st|nd|rd|th)\b", r"\1", str)

or better

re.sub(r"(?<=\d)(st|nd|rd|th)\b", '', str)

\b prevents things like 21strange from being replaced.

To replace only grammatically correct constructs, you can also try:

re.sub(r"(?<=1\d)th\b|(?<=1)st\b|(?<=2)nd\b|(?<=3)rd\b|(?<=[04-9])th\b", r'', str)

This replaces 23rd and 44th but leaves invalid things like 23st intact. Don't know if this is worth the trouble though.

Post a Comment for "Python Regex: Replacing st, nd, th Etc In A Adress With A Single Sub"