Python Regex Of A Date In Some Text
Solution 1:
Here's a way to find all dates matching your pattern
re.findall(r'\d\d\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{4}', text)
But after WilhelmTell's comment on your question, I'm also wondering whether this it what you were really asking for...
Solution 2:
Use the calendar module to give you a little global awareness:
date_expr = r"\d{2} (?:%s) \d{4}" % '|'.join(calendar.month_abbr[1:])
print date_expr
print re.findall(date_expr, source_text)
For me, this creates a date_expr like:
"\d{2} (:?Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{4}"
But if I change my locale using the locale module:
locale.setlocale(0, "fr")
I now search for months in French:
"\d{2} (?:janv.|févr.|mars|avr.|mai|juin|juil.|août|sept.|oct.|nov.|déc.) \d{4}"
Hmm, this is the first time I ever tried French month abbreviations, I may need to do some cleanup:
date_expr = r"\d{2} (?:%s) \d{4}" % '|'.join(
m.title().rstrip('.') for m in calendar.month_abbr[1:])
Now I get:
"\d{2} (?:Janv|Févr|Mars|Avr|Mai|Juin|Juil|Août|Sept|Oct|Nov|Déc) \d{4}"
And now my script will run for my Gallic friends as well, with really very little trouble.
(You may wonder why I had to slice the month_abbr list from [1:] - this list begins with an empty string in position 0, so that if you use find() to look up a particular month abbreviation, you will get back a number from 1-12, instead of from 0-11.)
-- Paul
Solution 3:
Here's a slightly more complete example. The regexp will match more than just valid date value. datetime.strptime
will fail to parse anything that is not valid and raise a ValueError
. If the date is parsed, then you have a full datetime
object that gives you access to a lot of functionality.
>>>from datetime import datetime>>>import re>>>dates = []>>>patn = re.compile(r'\d{2} \w{3} \d{4}')>>>fh = open('inputfile')>>>for line in fh:...for match in patn.findall(line):...try:... val = datetime.strptime(match, '%d %b %Y')... dates.append(val)...except ValueError:...pass# ignore, this isn't a date...
I imagine that this can be collapsed into nice tight code with comprehensions if you are so inclined.
Solution 4:
Try this:
import re
allmatches = re.findall(r'\d\d \w\w\w \d\d\d\d', "string to match")
Solution 5:
or you can use this for completelly
date = re.findall(r'\d\d\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{4}\s\d{2}:\d{2}', text)
print date
['30 November 2010 14:20', '30 November 2010 14:24']
Post a Comment for "Python Regex Of A Date In Some Text"