Python Forum

Full Version: Reading in of line not working?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Just posted about my file not working and realised, as soon as I finished, that I already knew the solution.

However, I'd like a more flexible solution.

Some of my files are being encoded using UTF-16 LE, others are likely to be UTF-8 and variants inbetween.

So that I can read various files, how might I easily use the correct form of opening of a file, based upon the file's encoding?

For example, my original line of Python read:

file = open(datafile, 'r')
To cater for the UTF=16 LE files, I adapted this to:

file = open(datafile, 'r', encoding='utf-16-le')
Which now means I cannot read in the older files.
(Sep-19-2023, 08:01 AM)garynewport Wrote: [ -> ]Which now means I cannot read in the older files.
Hi, I once had that problem, and I remember that there is a module called "chardet"
which tells you the type of encoding. I do not know if it is still maintained to today's standards,
but it is worth a try.
Paul
I would see if convert files to utf-8 eg online Convert files to UTF-8.
Chardet will detect/guess on encoding.
G:\div_code\file_test
λ chardetect file_1.txt
file_1.txt: ascii with confidence 1.0

# The file i test in code
G:\div_code\file_test
λ chardetect file_2.txt
file_2.txt: utf-8 with confidence 0.99

G:\div_code\file_test
λ chardetect file_le.txt
file_le.txt: UTF-16 with confidence 1.0
An option is to read file with try: except if no error will use file,if decode error(UnicodeDecodeError) will go on and try next encoding.
try:
    with open("file_2.txt", encoding='utf-16-le') as fp:
        content = fp.read()
        print(content)
except Exception as error:
    print(f'{error}\n')
    try:
        with open("file_2.txt", encoding='cp1252') as fp:
            content = fp.read()
            print(content)
    except Exception as error:
        print(f'{error}\n')
        try:
            with open("file_2.txt", encoding='utf-8', errors='ignore') as fp:
                content = fp.read()
                print(content)
        except Exception as error:
            print(f'{error}\n')
Output:
'utf-16-le' codec can't decode byte 0xa9 in position 46: truncated data 'charmap' codec can't decode byte 0x8d in position 10: character maps to <undefined> hello 楍牣獯景 ����� résumé