Skip to content Skip to sidebar Skip to footer

How To Correct The Misencoded String?

i used mutagen to read the mp3 metadata, since the id3 tag is read in as unicode but in fact it is GBK encoded. how to correct this in python? audio = EasyID3(name) title = audio['

Solution 1:

It looks like the string has been decoded to unicode using the wrong encoding (latin-1).

You need to encode it to a byte string and then decode it back to unicode using the correct encoding.

title = u'\xb5\xb1\xc4\xe3\xb9\xc2\xb5\xa5\xc4\xe3\xbb\xe1\xcf\xeb\xc6\xf0\xcb\xad'print title.encode('latin-1').decode('gbk')
当你孤单你会想起谁

Solution 2:

Looks like it's auto-decoding using latin1. To fix:

>>>title = u'\xb5\xb1\xc4\xe3\xb9\xc2\xb5\xa5\xc4\xe3\xbb\xe1\xcf\xeb\xc6\xf0\xcb\xad'>>>print title.encode('latin1').decode('GBK')
当你孤单你会想起谁

Tested in Python 2.x but should work fine in 3 as well.

Post a Comment for "How To Correct The Misencoded String?"