Python Forum

Full Version: Text file read issues
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Hello again

So the system still isn't letting me post attachments but no matter, I've uploaded one of the offending .txt files to wetransfer if anyone would like to take a look.

https://we.tl/DWy3fBvra9

I'm still unable to get python to read it and return the lines as they are shown in notepad. Still gives me

Output:
b'\xff\xfe2\x005\x00 \x00A\x00u\x00g\x00u\x00s\x00t\x00 \x002\x000\x001\x007\x00 \x000\x009\x00:\x001\x005\x00:\x000\x000\x00\t\x002\x003\x00.\x001\x009\x00\n' b'\x00\n' b'\x002\x005\x00 \x00A\x00u\x00g\x00u\x00s\x00t\x00 \x002\x000\x001\x007\x00 \x000\x009\x00:\x000\x000\x00:\x000\x000\x00\t\x002\x002\x00.\x009\x008\x00\n' b'\x00\n' b'\x002\x005\x00 \x00A\x00u\x00g\x00u\x00s\x00t\x00 \x002\x000\x001\x007\x00 \x000\x008\x00:\x004\x005\x00:\x000\x000\x00\t\x002\x002\x00.\x007\x006\x00\n' b'\x00\n' b'\x002\x005\x00 \x00A\x00u\x00g\x00u\x00s\x00t\x00 \x002\x000\x001\x007\x00 \x000\x008\x00:\x003\x000\x00:\x000\x000\x00\t\x002\x002\x00.\x006\x002\x00\n' b'\x00\n' b'\x002\x005\x00 \x00A\x00u\x00g\x00u\x00s\x00t\x00 \x002\x000\x001\x007\x00 \x000\x008\x00:\x001\x005\x00:\x000\x000\x00\t\x002\x002\x00.\x004\x001\x00\n' b'\x00\n'
I'm still convinced this is all because it's in Unicode but I haven't a clue how to solve it.

If anyone can help i'd be very grateful.

EDIT: BTW, it's post #4 for a more detailed explanation of what I've tried and what is happening. Cheers
I succeed to print each line with the latin-1 encoding.

Like this:

with open('170825t05L11O44S3.txt', 'r', encoding='latin-1') as f:
    for line in f:
        print(line)
Now you have to parse the info that you want.
Rename the file name,there something strange with it.
Rename to bar.txt and run chardet.
C:\code
λ chardetect bar.txt
bar.txt: UTF-16 with confidence 1.0
with open('bar.txt', encoding='UTF-16') as f:
    for line in f:
        print(line.strip())
Output:
25 August 2017 09:15:00 23.19 25 August 2017 09:00:00 22.98 25 August 2017 08:45:00 22.76 25 August 2017 08:30:00 22.62 25 August 2017 08:15:00 22.41 25 August 2017 08:00:00 22.34 25 August 2017 07:45:00 22.27 25 August 2017 07:30:00 22.27 ........
Thanks both

snippsat - that worked like a charm, thank you. I'll have to check out chardet. It looks very handy for figuring out encoding.

Ok, next question (sorry!)

I've tried to put this data into a list so I can work with it but it looks different in list format to printed format:

datapoints = []
with open("C:\\...\\170825  t05 L11O44S3.txt", encoding ='UTF-16') as f:
    for i in f:
        datapoints.append(i)

print(datapoints)
Output:
['25 August 2017 09:15:00\t23.19\n', '25 August 2017 09:00:00\t22.98\n', '25 August 2017 08:45:00\t22.76\n', '25 August 2017 08:30:00\t22.62\n, ....'
So I tried removing the \, t and n in the list by modifying the code to

datapoints = []
with open("C:\\...\\170825  t05 L11O44S3.txt", encoding ='UTF-16') as f:
    for i in f:
        datapoints.append(i)

for data in datapoints:
    data = data.replace('\\t',' ')
    data = data.replace('\\n','')

print(datapoints)
but this results in exactly the same output as before.

What am I doing wrong?
'\t' and '\n', not '\\t' and '\\n'.
Nice one Buran, that did it.
let's do it properly
import csv

with open('170825  t05 L11O44S3.txt', encoding='utf-16') as f:
    rdr = csv.reader(f, delimiter='\t')
    for line in rdr:
        print(line)
each line will turn to 2-element list. you can join them together again or use separately

Output:
['25 August 2017 09:15:00', '23.19'] ['25 August 2017 09:00:00', '22.98'] ['25 August 2017 08:45:00', '22.76'] ['25 August 2017 08:30:00', '22.62'] ['25 August 2017 08:15:00', '22.41'] ['25 August 2017 08:00:00', '22.34'] ....
or if you prefer single list
import csv

with open('170825  t05 L11O44S3.txt', encoding='utf-16') as f:
    rdr = csv.reader(f, delimiter='\t')
    data = [' '.join(line) for line in rdr]
    print(data)
Output:
['25 August 2017 09:15:00 23.19', '25 August 2017 09:00:00 22.98', '25 August 2017 08:45:00 22.76', '25 August 2017 08:30:00 22.62', '25 August 2017 08:15:00 22.41',...]
Pages: 1 2