Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Text file read issues
#11
Hello again

So the system still isn't letting me post attachments but no matter, I've uploaded one of the offending .txt files to wetransfer if anyone would like to take a look.

https://we.tl/DWy3fBvra9

I'm still unable to get python to read it and return the lines as they are shown in notepad. Still gives me

Output:
b'\xff\xfe2\x005\x00 \x00A\x00u\x00g\x00u\x00s\x00t\x00 \x002\x000\x001\x007\x00 \x000\x009\x00:\x001\x005\x00:\x000\x000\x00\t\x002\x003\x00.\x001\x009\x00\n' b'\x00\n' b'\x002\x005\x00 \x00A\x00u\x00g\x00u\x00s\x00t\x00 \x002\x000\x001\x007\x00 \x000\x009\x00:\x000\x000\x00:\x000\x000\x00\t\x002\x002\x00.\x009\x008\x00\n' b'\x00\n' b'\x002\x005\x00 \x00A\x00u\x00g\x00u\x00s\x00t\x00 \x002\x000\x001\x007\x00 \x000\x008\x00:\x004\x005\x00:\x000\x000\x00\t\x002\x002\x00.\x007\x006\x00\n' b'\x00\n' b'\x002\x005\x00 \x00A\x00u\x00g\x00u\x00s\x00t\x00 \x002\x000\x001\x007\x00 \x000\x008\x00:\x003\x000\x00:\x000\x000\x00\t\x002\x002\x00.\x006\x002\x00\n' b'\x00\n' b'\x002\x005\x00 \x00A\x00u\x00g\x00u\x00s\x00t\x00 \x002\x000\x001\x007\x00 \x000\x008\x00:\x001\x005\x00:\x000\x000\x00\t\x002\x002\x00.\x004\x001\x00\n' b'\x00\n'
I'm still convinced this is all because it's in Unicode but I haven't a clue how to solve it.

If anyone can help i'd be very grateful.

EDIT: BTW, it's post #4 for a more detailed explanation of what I've tried and what is happening. Cheers
Reply
#12
I succeed to print each line with the latin-1 encoding.

Like this:

with open('170825t05L11O44S3.txt', 'r', encoding='latin-1') as f:
    for line in f:
        print(line)
Now you have to parse the info that you want.
Reply
#13
Rename the file name,there something strange with it.
Rename to bar.txt and run chardet.
C:\code
λ chardetect bar.txt
bar.txt: UTF-16 with confidence 1.0
with open('bar.txt', encoding='UTF-16') as f:
    for line in f:
        print(line.strip())
Output:
25 August 2017 09:15:00 23.19 25 August 2017 09:00:00 22.98 25 August 2017 08:45:00 22.76 25 August 2017 08:30:00 22.62 25 August 2017 08:15:00 22.41 25 August 2017 08:00:00 22.34 25 August 2017 07:45:00 22.27 25 August 2017 07:30:00 22.27 ........
Reply
#14
Thanks both

snippsat - that worked like a charm, thank you. I'll have to check out chardet. It looks very handy for figuring out encoding.

Ok, next question (sorry!)

I've tried to put this data into a list so I can work with it but it looks different in list format to printed format:

datapoints = []
with open("C:\\...\\170825  t05 L11O44S3.txt", encoding ='UTF-16') as f:
    for i in f:
        datapoints.append(i)

print(datapoints)
Output:
['25 August 2017 09:15:00\t23.19\n', '25 August 2017 09:00:00\t22.98\n', '25 August 2017 08:45:00\t22.76\n', '25 August 2017 08:30:00\t22.62\n, ....'
So I tried removing the \, t and n in the list by modifying the code to

datapoints = []
with open("C:\\...\\170825  t05 L11O44S3.txt", encoding ='UTF-16') as f:
    for i in f:
        datapoints.append(i)

for data in datapoints:
    data = data.replace('\\t',' ')
    data = data.replace('\\n','')

print(datapoints)
but this results in exactly the same output as before.

What am I doing wrong?
Reply
#15
'\t' and '\n', not '\\t' and '\\n'.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#16
Nice one Buran, that did it.
Reply
#17
let's do it properly
import csv

with open('170825  t05 L11O44S3.txt', encoding='utf-16') as f:
    rdr = csv.reader(f, delimiter='\t')
    for line in rdr:
        print(line)
each line will turn to 2-element list. you can join them together again or use separately

Output:
['25 August 2017 09:15:00', '23.19'] ['25 August 2017 09:00:00', '22.98'] ['25 August 2017 08:45:00', '22.76'] ['25 August 2017 08:30:00', '22.62'] ['25 August 2017 08:15:00', '22.41'] ['25 August 2017 08:00:00', '22.34'] ....
or if you prefer single list
import csv

with open('170825  t05 L11O44S3.txt', encoding='utf-16') as f:
    rdr = csv.reader(f, delimiter='\t')
    data = [' '.join(line) for line in rdr]
    print(data)
Output:
['25 August 2017 09:15:00 23.19', '25 August 2017 09:00:00 22.98', '25 August 2017 08:45:00 22.76', '25 August 2017 08:30:00 22.62', '25 August 2017 08:15:00 22.41',...]
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Recommended way to read/create PDF file? Winfried 3 2,886 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,447 Nov-09-2023, 10:56 AM
Last Post: mg24
  read file txt on my pc to telegram bot api Tupa 0 1,119 Jul-06-2023, 01:52 AM
Last Post: Tupa
  parse/read from file seperated by dots giovanne 5 1,111 Jun-26-2023, 12:26 PM
Last Post: DeaD_EyE
  Formatting a date time string read from a csv file DosAtPython 5 1,281 Jun-19-2023, 02:12 PM
Last Post: DosAtPython
  How do I read and write a binary file in Python? blackears 6 6,598 Jun-06-2023, 06:37 PM
Last Post: rajeshgk
  Read csv file with inconsistent delimiter gracenz 2 1,200 Mar-27-2023, 08:59 PM
Last Post: deanhystad
  Read text file, modify it then write back Pavel_47 5 1,593 Feb-18-2023, 02:49 PM
Last Post: deanhystad
  Correctly read a malformed CSV file data klllmmm 2 1,941 Jan-25-2023, 04:12 PM
Last Post: klllmmm
  How to read csv file update matplotlib column chart regularly SamLiu 2 1,062 Jan-21-2023, 11:33 PM
Last Post: SamLiu

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020