Python Regular Expressions - Extract Every Table Cell Content
Possible Duplicate: RegEx match open tags except XHTML self-contained tags If I have a string that looks something like... '123 234
Solution 1:
If that markup is part of a larger set of markup, you should prefer a tool with a HTML parser. One such tool is BeautifulSoup. Here's one way to find what you need using that tool:
>>>markup = '''"<tr><td>123</td><td>234</td>...<td>697</td></tr>"'''>>>from bs4 import BeautifulSoup as bs>>>soup = bs(markup)>>>for i in soup.find_all('td'):...print(i.text)
Result:
123 234 697
Solution 2:
Don't do this. Just use a proper HTML parser, and use something like xpath to get the elements you want.
A lot of people like lxml. For this task, you will probably want to use the BeautifulSoup backend, or use BeautifulSoup directly, because this is presumably not markup from a source known to generate well-formed, valid documents.
Solution 3:
When using lxml, an element tree gets created. Each element in the element tree holds information about a tag.
from lxml import etree
root = etree.XML("<root><ax='123'>aText<b/><c/><b/></a></root>")
elements = root.findall(".//a")
tag = elements[0].tag
attr = elements[0].attrib
Post a Comment for "Python Regular Expressions - Extract Every Table Cell Content"