Python: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
I was recently doing some text scrubbing and had difficulty working out how to remove the '†' character from strings.
e.g. I had a string like this:
>>> u'foo †' u'foo \u2020'
I wanted to get rid of the '†' character and then strip any trailing spaces so I'd end up with the string 'foo'. I tried to do this in one call to 'replace':
>>> u'foo †'.replace(" †", "") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128)
It took me a while to work out that "† " was being treated as ASCII rather than UTF-8. Let's fix that:
>>> u'foo †'.replace(u' †', "") u'foo'
I think the following call to unicode, which I've written about before, is equivalent:
>>> u'foo †'.replace(unicode(' †', "utf-8"), "") u'foo'
Now back to the scrubbing!