Python Forum
'utf-8' codec can't decode byte 0xda in position 184: invalid continuation byte
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
'utf-8' codec can't decode byte 0xda in position 184: invalid continuation byte
#4
(Sep-06-2019, 02:02 PM)karkas Wrote: This has taken me some time and I haven't found a way to figure it out. I tried this:
It also install chardetect that can be command line(cmd) or cmder as i use.
C:\code
λ chardetect myfile.txt
myfile.txt: ascii with confidence 1.0
Quote:This excerpt does raise the exact same exception.
It should not if the file is saved as utf-8.
# copy of text in post and save as utf-8
λ chardetect test.srt
test.srt: utf-8 with confidence 0.505
Test.
λ python -V
Python 3.7.3

# Code used
E:\div_code
λ cat uni_music.py
with open('test.srt', encoding='utf-8') as f:
    file_list = [i.strip() for i in f]

# Run code interactively 
E:\div_code
λ ptpython -i uni_music.py
>>> from pprint import pprint

>>> pprint(file_list)
['1',
 '00:00:00,066 --> 00:00:01,888',
 'HOLA, SOY <i>JACK RICO,</i>',
 '',
 '2',
 '00:00:01,888 --> 00:00:04,444',
 '<i>Y ESTO ES </i>"TALLER',
 'DEL CONSUMIDOR".',
 '',
 '3',
 '00:00:04,444 --> 00:00:05,530',
 '<i>[MÚSICA]</i>']

# Last element,all is correct
>>> file_list[-1]
'<i>[MÚSICA]</i>'
Quote:How can I pass the raw data in binary format of a read file without "altering" the encoding
you use 'rb',but then still need to decode or the Unicode will look like this M\xc3\x9aSICA.
with open('test.srt', 'rb') as f:
    file_list = [i.strip() for i in f]
Output:
>>> file_list [b'1', b'00:00:00,066 --> 00:00:01,888', b'HOLA, SOY <i>JACK RICO,</i>', b'', b'2', b'00:00:01,888 --> 00:00:04,444', b'<i>Y ESTO ES </i>"TALLER', b'DEL CONSUMIDOR".', b'', b'3', b'00:00:04,444 --> 00:00:05,530', b'<i>[M\xc3\x9aSICA]</i>']
>>> file_list[-1]
b'<i>[M\xc3\x9aSICA]</i>'

>>> file_list[-1].decode() # Same as decode('utf-8') this is default
'<i>[MÚSICA]</i>'
Reply


Messages In This Thread
RE: 'utf-8' codec can't decode byte 0xda in position 184: invalid continuation byte - by snippsat - Sep-06-2019, 03:33 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Search for multiple unknown 3 (2) Byte combinations in a file. lastyle 7 1,381 Aug-14-2023, 02:28 AM
Last Post: deanhystad
Question UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 562: ord ctrldan 23 4,890 Apr-24-2023, 03:40 PM
Last Post: ctrldan
  UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd2 in position 16: invalid cont Melcu54 3 5,030 Mar-26-2023, 12:12 PM
Last Post: Gribouillis
  Decode string ? JohnnyCoffee 1 833 Jan-11-2023, 12:29 AM
Last Post: bowlofred
  extract only text strip byte array Pir8Radio 7 3,003 Nov-29-2022, 10:24 PM
Last Post: Pir8Radio
  [SOLVED] [Debian] UnicodeEncodeError: 'ascii' codec Winfried 1 1,037 Nov-16-2022, 11:41 AM
Last Post: Winfried
  sending byte in code? korenron 2 1,134 Oct-30-2022, 01:14 PM
Last Post: korenron
  UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 34: character Melcu54 7 19,071 Sep-26-2022, 10:09 AM
Last Post: Melcu54
  Byte Error when working with APIs Oshadha 2 1,022 Jul-05-2022, 05:23 AM
Last Post: deanhystad
  UnicodeEncodeError: 'ascii' codec can't encode character '\xfd' in position 14: ordin Armandito 6 2,745 Apr-29-2022, 12:36 PM
Last Post: Armandito

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020