Python Forum
Reading floats and ints from csv-like file
Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Reading floats and ints from csv-like file
#1
Hello :)

I have stored numbers in a file from another program, and I'd like to extract the values in lists to use with another python program I already wrote, here is how the data file looks like:

��1.12005000,1.11800000,14574
1.11947000,1.11811000,8285
1.12035000,1.11749000,7979
1.11812000,1.11597000,18181
1.13148000,1.11499000,30360
1.13176000,1.12344000,57786
1.12441000,1.11997000,24175
1.12261000,1.12067000,14455
1.12466000,1.12198000,10255
1.12643000,1.12209000,29588
For some unknown reason i have these strange characters at the beginning. I came up with the following code to get the values and get rid of the commas and EOL characters:

f = open('EU-1H.txt', 'r')
a = f.readline()
a = a[2:-1] #removing the two odd characters from he first line as well as the EOL character
fullarray, t = [], []

for numb in a[:-1].split(','):
    #print numb
    t.append(numb)
#print t
fullarray.append(t)

for i, line in enumerate(f):
    t = []
    for numb in line[:-1].split(','): #just making life easier by removing the EOL
        t.append(numb)
    fullarray.append(t)
    if i == 10: break #I don't need to process the whole thousands of lines to know if the code works

#print fullarray
To describe my issue, we'll just run this with the first two "print", namely "print numb" and "print t". This outputs:

1.12005000
1.11800000
14574
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r']
And if I try to convert to float by replacing t.append(numb) with t.append(float(numb)) I get the following output:
1.12005000
Traceback (most recent call last):
  File "reader.py", line 9, in <module>
    t.append(float(numb))
ValueError: invalid literal for float(): 1
To make my life easier, I wondered if I coult convert it back to a string inside the list to make the conversion to float/int later. So I tried by changing t.append(float(numb)) to t.append(str(numb)) which yielded as output:
1.12005000
1.11800000
14574
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r']
For some reasons, some of this code actually works when run directly in an interpreter in a terminal. Example:

>>> a = '1.12005000,1.11800000,14574' #copypasted from the source file
>>> a
'1.12005000,1.11800000,14574'
>>> a.split(',')
['1.12005000', '1.11800000', '14574']
But if I read the data from the data file:
f = open('EU-1H.txt', 'r')
a = f.readline()
a = a[2:-1]
print a
print a.split(',')
gives when run:
1.12005000,1.11800000,14574
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r\x00']
In interpreter:
>>> f = open('EU-1H.txt', 'r')
>>> a = f.readline()
>>> a[2:-1]
'1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00,\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00,\x001\x004\x005\x007\x004\x00\r\x00'
>>> print a[2:-1]
1.12005000,1.11800000,14574
>>> a[2:-1].split(',')
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r\x00']
>>> print a[2:-1].split(',')
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r\x00']
I have a lot of troubles understanding what's happening there.
My main goal is to just have something working (I expected to take no more than 10mn to write the storing algorithm and the data-retrieving one, apparently I couln't be more wrong, nothing went right), but I'd also like to understand what is going on obviously (so that I don't get stuck into this kind of issues again)
Any hints?
Thanks in advance.

Python 2.7.9 / Debian Jessie / i686
Reply
#2
(Sep-04-2017, 05:06 PM)Krookroo Wrote: [...]code to get the values and get rid of the commas and EOL characters [...] My main goal is to just have something working
I haven't read your whole post, but if I'm understand correct, you're over-complicating things
# -*- coding: latin-1 -*-

import re

sample = """��1.12005000,1.11800000,14574
1.11947000,1.11811000,8285
1.12035000,1.11749000,7979
1.11812000,1.11597000,18181
1.13148000,1.11499000,30360
1.13176000,1.12344000,57786
1.12441000,1.11997000,24175
1.12261000,1.12067000,14455
1.12466000,1.12198000,10255
1.12643000,1.12209000,29588
"""

print(re.findall(r"[\d.]+", sample))
If that solves your problem, and you still have other questions, it would help to break them down.
Output:
$ python testit.py  ['1.12005000', '1.11800000', '14574', '1.11947000', '1.11811000', '8285', '1.12035000', '1.11749000', '7979', '1.11812000', '1.11597000', '18181', '1.13148000', '1.11499000', '30360', '1.13176000', '1.12344000', '57786', '1.12441000', '1.11997000', '24175', '1.12261000', '1.12067000', '14455', '1.12466000', '1.12198000', '10255', '1.12643000', '1.12209000', '29588']
Reply
#3
try this:
import csv

def read_csv_data(filename):
    try:
        buffer = None
        with open(filename) as csvfile:
            buffer = csv.reader(csvfile, delimiter=',')
            for row in buffer:
                print(row)
    except csv.Error:
        print(f'Error reading {filename}')

if __name__ == '__main__':
    read_csv_data('sampleCSV.csv')
output:
Output:
['1.12005000', '1.11800000', '14574'] ['1.11947000', '1.11811000', '8285'] ['1.12035000', '1.11749000', '7979'] ['1.11812000', '1.11597000', '18181'] ['1.13148000', '1.11499000', '30360'] ['1.13176000', '1.12344000', '57786'] ['1.12441000', '1.11997000', '24175'] ['1.12261000', '1.12067000', '14455'] ['1.12466000', '1.12198000', '10255'] ['1.12643000', '1.12209000', '29588']
Reply
#4
Thanks for your input.
Larz60+'s format of output is exactly what I want, unfortunately:
import csv
 
def read_csv_data(filename):
    try:
        buffer = None
        with open(filename) as csvfile:
            buffer = csv.reader(csvfile, delimiter=',')
            for row in buffer:
                print(row)
    except csv.Error:
        print('Error reading {filename}')
 
if __name__ == '__main__':
    read_csv_data('EU-1H.txt')

outputs:
Error reading {filename}
My file is written through another program in another language, I decided to output in txt with a code that could be translated as:
loop:
    f.write(str(value1)+","+str(value2)+","+str(value3))
f.close()
------------------------------------------------------------------------------
micseydel ' answer:
I have no experience with re at all, but unless I made a big mistake, your code should work when implemented this way (?):
import re
f = open('EU-1H.txt', 'r')
print(re.findall(r"[\d.]+", line))
Unfortunately, it doesn't:
Output:
File "reader.py", line 4, in <module> print(re.findall(r"[\d.]+", f)) File "/usr/lib/python2.7/re.py", line 181, in findall return _compile(pattern, flags).findall(string) TypeError: expected string or buffer
The only way to get an output is this
import re
f = open('EU-1H.txt', 'r')
i = 0
for line in f:
    print(re.findall(r"[\d.]+", line))
    i += 1
    if i ==2: break
which outputs the ugly
Output:
['1', '.', '1', '2', '0', '0', '5', '0', '0', '0', '1', '.', '1', '1', '8', '0', '0', '0', '0', '0', '1', '4', '5', '7', '4'] ['1', '.', '1', '1', '9', '4', '7', '0', '0', '0', '1', '.', '1', '1', '8', '1', '1', '0', '0', '0', '8', '2', '8', '5']
I don't find how to attach my input file, a bit sorry there :(

Woops can't edit anymore, sorry for double post.
Interestingly, the representation of the data in an array or as a lone element is different, I don't understand at all what is going on:

with open('EU-1H.txt', 'r') as f:
    a = f.readline()
    a = a[2:].split(',')
print a
for elem in a:
    print elem, type(elem)
outputs:
Output:
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r\x00'] 1.12005000 <type 'str'> 1.11800000 <type 'str'> 14574 <type 'str'>
When trying float(elem) instead of type(elem) I get the following error
Error:
1.12005000 Traceback (most recent call last): File "reader.py", line 8, in <module> print float(elem) ValueError: invalid literal for float(): 1
Which led me to googling a bit and made me try the same above code with instead
    print elem, repre(elem)
and led to
Output:
1.12005000 '1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00'
Note that in the representation, there is this curious "1" at the beginning. I tried to remove it with a[3:] in the above code, which led to:
with open('EU-1H.txt', 'r') as f:
    a = f.readline()
    a = a[3:].split(',')
for elem in a:
    print elem
Output:
.12005000 1.11800000 14574
So if I understand well, there is a mixing of representation of the characters in my file, which makes all methods fail. Is that true or is there a simpler explanation?
Is there a way to force the character encoding to something wherever it comes from?

I'm almost ready to alter my writing program to make it write a stupid line at the beginning just so that I can put it aside and then get my data...
Reply
#5
It expects the file to be in the same directory. If not, you can process the filename
into a fill path by:
  • add
    import os
    to top of script.
  • before you make the call to the read_csv_data function, process the file name with
    filename = os.path.abspath('EU-1H.txt')
  • change the call to read_csv_data to
    read_csv_data(filename)

If that doesn't work check your filename.
because the code I gave you was tested and works.
Reply
#6
import csv
  
def read_csv_data(filename):
    try:
        buffer = None
        with open(filename) as csvfile:
            buffer = csv.reader(csvfile, delimiter=',')
            for row in buffer:
                print(row)
    except csv.Error:
        print('Error reading {filename}')
  
if __name__ == '__main__':
    read_csv_data('EU-1H.txt')
in terminal:

dfuu@pupute:/media/dfuu/Data/mt4 exchange files$ python reader.py 
Error reading {filename}
dfuu@pupute:/media/dfuu/Data/mt4 exchange files$ ls
EU-1H.txt  reader.py  reader.py~
Reply
#7
i think your current working directory is going to not be where the reader.py is located.
you need to pass the full path.
so I think you need to set
filename  = os.path.abspath('/media/dfuu/Data/mt4/EU-1H.txt')
read_csv_data(filename)
Reply
#8
import csv
import os

filename  = os.path.abspath('/media/dfuu/Data/mt4/EU-1H.txt')

def read_csv_data(filename):
    try:
        buffer = None
        with open(filename) as csvfile:
            buffer = csv.reader(csvfile, delimiter=',')
            for row in buffer:
                print(row)
    except csv.Error:
        print('Error reading {filename}')
   
if __name__ == '__main__':
    read_csv_data(filename)
and terminal is:
dfuu@pupute:/media/dfuu/Data/mt4$ python reader.py 
Error reading {filename}
dfuu@pupute:/media/dfuu/Data/mt4$ ls
EU-1H.txt  reader.py  reader.py~
I was expecting it wouldn't change anything as the directory I'm in with my terminal is the working directory in general. I'm really suspecting a problem with my input file which is not a true csv (I don't know what a "true csv" really is, I just described above how I wrote the file using another language, that other language being mql5 with which everything seems a pain to do... Unfortunately I don't have access to another language to extract this data. there is an option in it to write data to files in csv format but it was making ridiculous things and I also had that issue of the first few messed up characters, so I decided to make a txt output formated as csv).
I think it would be easier to attach the file so that you can see what it is to work with that.
I'm trying atm to work with the other lines and not the first one, but I can't find how to convert a single one to a float. This is getting ridiculous, I didn't expect to be stuck for 10 hours to write numbers in a file and retrieving them with python...

Edit: I just went to ask that other program to write data in "csv" format.
1.12035	1.11749	7979
1.11812	1.11597	18181
1.13148	1.11499	30360
1.13176	1.12344	57786
1.12441	1.11997	24175
1.12261	1.12067	14455
1.12466	1.12198	10255
1.12643	1.12209	29588
1.12887	1.12095	33628
1.12243	1.11729	35058
1.11896	1.11737	12049
1.11971	1.11828	8376
1.11934	1.11714	9451
1.12193	1.11732	26088
Data looks like this. Your above code raises the same error.

Okay, last post in there, and the bottom line:

When I open the data file in emacs and copy/paste the content in a leafpad, then save it, all the propositions above works (mine included).
It appears my data file was encoded in iso435143625642562 which led to character representation unbearable apparently.
Apparently, other users of mql5 have stopped trying to open files for writing in utf8 (when I said it had the reputation of making everything become a pain...)
I'll try to see if there is a way in python to reencode this stuff properly. There should be.
Thanks a lot for your time and effort, it was greatly appreciated :)
Reply
#9
Modify the code, to get more detailed output of the occurring error.
Without a detailed error message, it's hard to find an error.

import csv
import os
 
filename  = os.path.abspath('/media/dfuu/Data/mt4/EU-1H.txt')
 
def read_csv_data(filename):
    try:
        buffer = None
        with open(filename) as csvfile:
            buffer = csv.reader(csvfile, delimiter=',')
            for row in buffer:
                print(row)
    except csv.Error as e:
        print('Error reading {filename}')
        print(e)
    
if __name__ == '__main__':
    read_csv_data(filename)
Maybe you've still problem with parsing the text.
In your first example it looks like a ByteOrderMark for utf-8.

You can ship around this problem, if you open the file with the right encoding.
with open(filename, encoding='utf-8-sig') as csvfile:
But before you do thins, look first which error you get.
Otherwise it's guessing and guessing is not good in programming.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#10
Well, just strip() these ... things:

>>> "��1.12005000,1.11800000,14574".strip("�")
'1.12005000,1.11800000,14574'
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Sad problems with reading csv file. MassiJames 3 559 Nov-16-2023, 03:41 PM
Last Post: snippsat
  When is it safe to compare (==) two floats? Radical 4 650 Nov-12-2023, 11:53 AM
Last Post: PyDan
  Reading a file name fron a folder on my desktop Fiona 4 851 Aug-23-2023, 11:11 AM
Last Post: Axel_Erfurt
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,046 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Reading a file JonWayn 3 1,058 Dec-30-2022, 10:18 AM
Last Post: ibreeden
  Reading Specific Rows In a CSV File finndude 3 940 Dec-13-2022, 03:19 PM
Last Post: finndude
  Excel file reading problem max70990 1 865 Dec-11-2022, 07:00 PM
Last Post: deanhystad
  Replace columns indexes reading a XSLX file Larry1888 2 951 Nov-18-2022, 10:16 PM
Last Post: Pedroski55
  [split] why can't i create a list of numbers (ints) with random.randrange() astral_travel 7 1,430 Oct-23-2022, 11:13 PM
Last Post: Pedroski55
  Failing reading a file and cannot exit it... tester_V 8 1,757 Aug-19-2022, 10:27 PM
Last Post: tester_V

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020