(Sep-06-2019, 02:02 PM)karkas Wrote: This has taken me some time and I haven't found a way to figure it out. I tried this:
It also install
chardetect
that can be command line(cmd) or
cmder as i use.
C:\code
λ chardetect myfile.txt
myfile.txt: ascii with confidence 1.0
Quote:This excerpt does raise the exact same exception.
It should not if the file is saved as utf-8.
# copy of text in post and save as utf-8
λ chardetect test.srt
test.srt: utf-8 with confidence 0.505
Test.
λ python -V
Python 3.7.3
# Code used
E:\div_code
λ cat uni_music.py
with open('test.srt', encoding='utf-8') as f:
file_list = [i.strip() for i in f]
# Run code interactively
E:\div_code
λ ptpython -i uni_music.py
>>> from pprint import pprint
>>> pprint(file_list)
['1',
'00:00:00,066 --> 00:00:01,888',
'HOLA, SOY <i>JACK RICO,</i>',
'',
'2',
'00:00:01,888 --> 00:00:04,444',
'<i>Y ESTO ES </i>"TALLER',
'DEL CONSUMIDOR".',
'',
'3',
'00:00:04,444 --> 00:00:05,530',
'<i>[MÚSICA]</i>']
# Last element,all is correct
>>> file_list[-1]
'<i>[MÚSICA]</i>'
Quote:How can I pass the raw data in binary format of a read file without "altering" the encoding
you use
'rb'
,but then still need to decode or the Unicode will look like this
M\xc3\x9aSICA
.
with open('test.srt', 'rb') as f:
file_list = [i.strip() for i in f]
Output:
>>> file_list
[b'1',
b'00:00:00,066 --> 00:00:01,888',
b'HOLA, SOY <i>JACK RICO,</i>',
b'',
b'2',
b'00:00:01,888 --> 00:00:04,444',
b'<i>Y ESTO ES </i>"TALLER',
b'DEL CONSUMIDOR".',
b'',
b'3',
b'00:00:04,444 --> 00:00:05,530',
b'<i>[M\xc3\x9aSICA]</i>']
>>> file_list[-1]
b'<i>[M\xc3\x9aSICA]</i>'
>>> file_list[-1].decode() # Same as decode('utf-8') this is default
'<i>[MÚSICA]</i>'