I am trying to work with several documents that all have various encodings - some utf-8, some ISO-8859-2, some ascii etc. Is there a reliable way of decoding to a standard encoding for processing?
I have tried the following:
import chardet
encoding = chardet.detect(text)
text = unicode(text,encoding['encoding']).decode(sys.getdefaultencoding(),'ignore')
With the above code I still get UnicodeEncodeError errors