Skip to content Skip to sidebar Skip to footer

Python Regular Expression (from Quick Book)

I have recently learned a bit about Python re module from 'Python Quick Book'. I have tried to test a code from the book. besides there is no error in my code, it is not recognizi

Solution 1:

The regexp is wrong for the input data

To fix it, take the following approach

drop into the python immediate evalution and import re

$ python
Python 2.7.3 (default, Aug  1 2012, 05:14:39) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re

define a string as one of the input lines

>>> str="Khan, Ahmed Ali : 800-123-4567"

apply the regexp patterns a bit at a time to see what fails

>>> regexp = re.compile(r"(?P<last>[-a-zA-Z]+)")
>>> result=regexp.search(str)
>>> print result.group('last')
Khan

so the first one works, try the first two

>>> regexp = re.compile(r"(?P<last>[-a-zA-Z]+)"
...                     r"(?P<first>[-a-zA-Z]+)")
>>> result=regexp.search(str)
>>> print result.group('last')
Kha
>>> print result.group('first')
n

Oh dear! Looking carefully, the str has a comma and space after Khan, and so let's fix that

>>> regexp = re.compile(r"(?P<last>[-a-zA-Z]+),\s+"
... r"(?P<first>[-a-zA-Z]+)")
>>> result=regexp.search(str)
>>> print result.group('last')
Khan
>>> print result.group('first')
Ahmed
>>> 

Just adjust the regexps like this interactively until it works on one input string. Then copy the working regexps back to your program


Solution 2:

Your code requires two spaces before the middle name:

r" ( (?P<middle> ([-a-zA-Z]+)))?"
# ^ ^

Instead, you should use the \s character class and * or + quantifiers. Also, explictly closing files, using re.search when you really want re.match, and comparing to None with == are all bad practices. Instead, write your code like this:

import re
regexp = re.compile(r"(?P<last>[-a-zA-Z]+), "
                    r"(?P<first>[-a-zA-Z]+)"
                    r"(\s+(?P<middle>[-a-zA-Z]+))?\s*"
                    r":\s*(?P<phone>(\d{3}-)?\d{3}-\d{4})$"
                    )
with open('dir.txt', 'r') as f:
    for line in f:
        result = regexp.match(line)
        if result is None:
            print ("Oops, I don't think this is a record")
            continue
        lastname = result.group('last')
        firstname = result.group('first')
        middlename = result.group('middle')
        if middlename is None:
            middlename = ''
        phonenumber = result.group('phone')
        print ('Name:', firstname, middlename, lastname, ' Number: ',phonenumber)

Post a Comment for "Python Regular Expression (from Quick Book)"