Dec-06-2022, 07:09 PM
I encountered an interesting problem in that PowerShell under Windows 11 will create any new file as UTF-16 instead of UTF-8. It is possible to set Powershell to act otherwise but this would require everyone to do this, if your code was to be used more widely.
I am reading in a file with over 30000 lines of data; in fact I am reading in several files, each with 30000 lines of data.
I open the file and then check each line for a keyword. If the keyword exists, I carry out some operation (what this is isn't relevant to this query).
My base code looks like this:
All solutions require a reading in of the line, using a try/except statement, and, on an error, converting the line to UTF-8 or instructing the system to read it in as UTF-16. I do not want to read in each line; this'd add lots of time.
I could set encoding to UTF-8 and ignore errors, but then the UTF-16 isn't read in by Python correctly; the "hydro" instruction is ignored and/or the data is incorrect.
How would I identify the file's encoding and open accordingly?
I am reading in a file with over 30000 lines of data; in fact I am reading in several files, each with 30000 lines of data.
I open the file and then check each line for a keyword. If the keyword exists, I carry out some operation (what this is isn't relevant to this query).
My base code looks like this:
1 2 3 4 5 6 |
keyword = "hydro" # Create an empty Numpy 2D array of 0 rows and 8 columns; all of the data type float file = open (datafile) for line in file : if keyword.casefold() in line.casefold(): |
I could set encoding to UTF-8 and ignore errors, but then the UTF-16 isn't read in by Python correctly; the "hydro" instruction is ignored and/or the data is incorrect.
How would I identify the file's encoding and open accordingly?