Python Forum
utf-8 decoding failed every time i try
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
utf-8 decoding failed every time i try
#3
I am getting the string from scrapping webpage, its not returning valid utf-8
In linux, while working fine in windows, so i m trying to encode that garbled string into latin then valid utf-8.

(Aug-23-2019, 05:41 AM)justinram11 Wrote: Hey Adnanahsan,

I'm not exactly sure what you are trying to do, but I think you may be mixing up the purpose of encoding and decoding.

When you "encode" a string, what you are really doing is changing it to a specific set of 1's and 0's.

So for example, a string '³' when encoded into utf-8 produces:

from bitstring import BitArray
test = '³'
encoded = test.encode('utf-8')

print(BitArray(encoded).bin)
1100 0010 1011 0011
While if it's encoded into latin-1 produces:

from bitstring import BitArray
test = '³'
encoded = test.encode('latin-1')

print(BitArray(encoded).bin)
1011 0011
But when you decode something, what you are doing is taking the 1's and 0's and turning them back into actual letters that python can understand. As shown above, however, the 1's and 0's between the utf-8 and latin-1 are not the same.

So what you are doing is taking a string and producing 1's and 0's in the latin-1 format, and then asking python to try and read those 1's and 0's as if they were in the utf-8 format. It can't, however, because the 1's and 0's are not in utf-8 format, they are in latin-1 format
Reply


Messages In This Thread
RE: utf-8 decoding failed every time i try - by adnanahsan - Aug-23-2019, 10:06 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Decoding lat/long in file name johnmcd 4 418 Mar-22-2024, 11:51 AM
Last Post: johnmcd
  Enigma Decoding Problem krisarmstrong 4 799 Dec-14-2023, 10:42 AM
Last Post: Larz60+
  json decoding error deneme2 10 3,774 Mar-22-2023, 10:44 PM
Last Post: deanhystad
  flask app decoding problem mesbah 0 2,385 Aug-01-2021, 08:32 PM
Last Post: mesbah
  Decoding a serial stream AKGentile1963 7 8,711 Mar-20-2021, 08:07 PM
Last Post: deanhystad
  xml decoding failure(bs4) roughstroke 1 2,289 May-09-2020, 04:37 PM
Last Post: snippsat
  python3 decoding problem but python2 OK mesbah 0 1,821 Nov-30-2019, 04:42 PM
Last Post: mesbah
  hex decoding in Python 3 rdirksen 2 4,643 May-12-2019, 11:49 AM
Last Post: rdirksen
  Decoding log files in binary using an XML file. captainfantastic 1 2,450 Apr-04-2019, 02:24 AM
Last Post: captainfantastic
  decoding sub.process output with multiple \n? searching1 2 2,823 Feb-24-2019, 12:00 AM
Last Post: searching1

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020