Hi All,
I am testing chardet in one of my scripts. I wanted to identify the encoding type of a result variable and chardet seems to do fine here.
So this is what I am doing:
myvar1 <-- gets its value from other functions
myvar2 = chardet.detect(myvar1) <-- to detect the encoding type of myvar1
Now when I do a print myvar2, I receive the output:
{'confidence': 1.0, 'encoding': 'ascii'}
Question 1: Can someone give pointer on how to collect only the encoding value part out of this, i.e. ascii.
Edit: The scenario is as follows:
I am using unicode(myvar1) to write all input as unicode. But as soon as myvar1 gets a value like 0xab, unicode(myvar1) fails with the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xab in position xxx: ordinal not in range(128)
Therefore, I am tring to:
- first identify the encoding type of the input which comes in myvar1,
- take the encoding type in myvar2,
- decode the input (myvar1) with this encoding (myvar2) using decode() [?]
- pass it on to unicode.
The input coming in is variable and not in my control.
I am sure there are other ways to do this, but I am new to this. And I am open to trying.
Any pointer please.
Many Thanks.