What's The Easiest Way To Extract The Links On A Web Page Using Python Without Beautifulsoup?
I'm using cygwin and do not have BeautifulSoup installed.
Solution 1:
Getting the value of href attributes in all <a> tags on a html file with Python
python, regex to find anchor link html
Regular expression to extract URL from an HTML link
Solution 2:
If you don't care much about performance you can use regular expressions:
import re
linkre = re.compile(r"""href=["']([^"']+)["']""")
links = linkre.findall(your_html)
If you just want links like in http:// links then change the expression to:
linkre = re.compile(r"""href=["']http:([^"']+)["']""")
Or you can put "' as optional if by some chance you have html without them around the links.
Post a Comment for "What's The Easiest Way To Extract The Links On A Web Page Using Python Without Beautifulsoup?"