Skip to content Skip to sidebar Skip to footer

What's The Easiest Way To Extract The Links On A Web Page Using Python Without Beautifulsoup?

I'm using cygwin and do not have BeautifulSoup installed.

Solution 1:

Solution 2:

If you don't care much about performance you can use regular expressions:

import re
linkre = re.compile(r"""href=["']([^"']+)["']""")
links = linkre.findall(your_html)

If you just want links like in http:// links then change the expression to:

linkre = re.compile(r"""href=["']http:([^"']+)["']""")

Or you can put "' as optional if by some chance you have html without them around the links.

Post a Comment for "What's The Easiest Way To Extract The Links On A Web Page Using Python Without Beautifulsoup?"