Skip to content Skip to sidebar Skip to footer

Stuck With Encodings In Python With Beautifulsoup

The page is encoded in UTF-8 and with python's HTMLParser it works well, no UnicodeDecodeError, but I do get an error when I try to parse it with BeautifulSoup. I've tried _*_ codi

Solution 1:

Ok i found the solution in my last try, maybe it will help others with the same problem. It needs to be encoded, not decoded

print( [e.encode('utf-8', 'ignore') for e in stuff] )

Solution 2:

You shouldn't be getting UnicodeEncodeError: 'ascii'.. errors when you print. This is often caused if your locale is corrupt or set to C. Python is then unable to set an appropriate encoder on the stdout stream.

Run locale and check for errors or warnings.

If you can't fix your locale, you can often override Python's stdout encoder with by setting PYTHONIOENCODING in your environment to an encoding that matches your terminal emulation. Often you can get by with:

export PYTHONIOENCODING=UTF-8

or

PYTHONIOENCODING=UTF-8 python my_script.py

Post a Comment for "Stuck With Encodings In Python With Beautifulsoup"