Aug-23-2019, 10:06 AM
I am getting the string from scrapping webpage, its not returning valid utf-8
In linux, while working fine in windows, so i m trying to encode that garbled string into latin then valid utf-8.
In linux, while working fine in windows, so i m trying to encode that garbled string into latin then valid utf-8.
(Aug-23-2019, 05:41 AM)justinram11 Wrote: Hey Adnanahsan,
I'm not exactly sure what you are trying to do, but I think you may be mixing up the purpose of encoding and decoding.
When you "encode" a string, what you are really doing is changing it to a specific set of 1's and 0's.
So for example, a string '³' when encoded into utf-8 produces:
from bitstring import BitArray test = '³' encoded = test.encode('utf-8') print(BitArray(encoded).bin) 1100 0010 1011 0011While if it's encoded into latin-1 produces:
from bitstring import BitArray test = '³' encoded = test.encode('latin-1') print(BitArray(encoded).bin) 1011 0011But when you decode something, what you are doing is taking the 1's and 0's and turning them back into actual letters that python can understand. As shown above, however, the 1's and 0's between the utf-8 and latin-1 are not the same.
So what you are doing is taking a string and producing 1's and 0's in the latin-1 format, and then asking python to try and read those 1's and 0's as if they were in the utf-8 format. It can't, however, because the 1's and 0's are not in utf-8 format, they are in latin-1 format