Python Forum
Reading in of line not working?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Reading in of line not working?
#1
Just posted about my file not working and realised, as soon as I finished, that I already knew the solution.

However, I'd like a more flexible solution.

Some of my files are being encoded using UTF-16 LE, others are likely to be UTF-8 and variants inbetween.

So that I can read various files, how might I easily use the correct form of opening of a file, based upon the file's encoding?

For example, my original line of Python read:

file = open(datafile, 'r')
To cater for the UTF=16 LE files, I adapted this to:

file = open(datafile, 'r', encoding='utf-16-le')
Which now means I cannot read in the older files.
Reply
#2
(Sep-19-2023, 08:01 AM)garynewport Wrote: Which now means I cannot read in the older files.
Hi, I once had that problem, and I remember that there is a module called "chardet"
which tells you the type of encoding. I do not know if it is still maintained to today's standards,
but it is worth a try.
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#3
I would see if convert files to utf-8 eg online Convert files to UTF-8.
Chardet will detect/guess on encoding.
G:\div_code\file_test
λ chardetect file_1.txt
file_1.txt: ascii with confidence 1.0

# The file i test in code
G:\div_code\file_test
λ chardetect file_2.txt
file_2.txt: utf-8 with confidence 0.99

G:\div_code\file_test
λ chardetect file_le.txt
file_le.txt: UTF-16 with confidence 1.0
An option is to read file with try: except if no error will use file,if decode error(UnicodeDecodeError) will go on and try next encoding.
try:
    with open("file_2.txt", encoding='utf-16-le') as fp:
        content = fp.read()
        print(content)
except Exception as error:
    print(f'{error}\n')
    try:
        with open("file_2.txt", encoding='cp1252') as fp:
            content = fp.read()
            print(content)
    except Exception as error:
        print(f'{error}\n')
        try:
            with open("file_2.txt", encoding='utf-8', errors='ignore') as fp:
                content = fp.read()
                print(content)
        except Exception as error:
            print(f'{error}\n')
Output:
'utf-16-le' codec can't decode byte 0xa9 in position 46: truncated data 'charmap' codec can't decode byte 0x8d in position 10: character maps to <undefined> hello 楍牣獯景 ����� résumé
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Reading and storing a line of output from pexpect child eagerissac 1 4,279 Feb-20-2024, 05:51 AM
Last Post: ayoshittu
  [Solved] Reading every nth line into a column from txt file Laplace12 7 5,248 Jun-29-2021, 09:17 AM
Last Post: Laplace12
  Subprocess.Popen() not working when reading file path from csv file herwin 13 15,125 May-07-2021, 03:26 PM
Last Post: herwin
  Ignore first few letters of a line when reading file. ShakeyPakey 16 6,471 May-30-2020, 02:17 PM
Last Post: BitPythoner
  some help with reading line from CMD on PI korenron 4 2,283 May-21-2020, 08:25 AM
Last Post: korenron
  EOFError: EOF when reading a line - Runtime Error RavCOder 6 9,690 Sep-27-2019, 12:22 PM
Last Post: RavCOder
  reading data from command line mcmxl22 2 2,000 Feb-17-2019, 09:01 PM
Last Post: Axel_Erfurt
  reading a line of a CSV Skaperen 2 2,251 Feb-10-2019, 08:10 PM
Last Post: Skaperen
  First line of File gets deleted when reading file lrxM 2 4,332 Dec-24-2016, 10:56 AM
Last Post: lrxM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020