Reading floats and ints from csv-like file

Krookroo · (This post was last modified: Sep-04-2017, 05:06 PM by Krookroo.)

Hello :)

I have stored numbers in a file from another program, and I'd like to extract the values in lists to use with another python program I already wrote, here is how the data file looks like:

        
              ��1.12005000,1.11800000,14574
11947000,1.11811000,8285
12035000,1.11749000,7979
11812000,1.11597000,18181
13148000,1.11499000,30360
13176000,1.12344000,57786
12441000,1.11997000,24175
12261000,1.12067000,14455
12466000,1.12198000,10255
12643000,1.12209000,29588

For some unknown reason i have these strange characters at the beginning. I came up with the following code to get the values and get rid of the commas and EOL characters:

        
              f = open('EU-1H.txt', 'r')
a = f.readline()
a = a[2:-1] #removing the two odd characters from he first line as well as the EOL character
fullarray, t = [], []
 
for numb in a[:-1].split(','):
    #print numb
    t.append(numb)
#print t
fullarray.append(t)
 
for i, line in enumerate(f):
    t = []
    for numb in line[:-1].split(','): #just making life easier by removing the EOL
        t.append(numb)
    fullarray.append(t)
    if i == 10: break #I don't need to process the whole thousands of lines to know if the code works
 
#print fullarray

To describe my issue, we'll just run this with the first two "print", namely "print numb" and "print t". This outputs:

        
              1.12005000
1.11800000
14574
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r']

And if I try to convert to float by replacing t.append(numb) with t.append(float(numb)) I get the following output:

        
              1.12005000
Traceback (most recent call last):
  File "reader.py", line 9, in <module>
    t.append(float(numb))
ValueError: invalid literal for float(): 1

To make my life easier, I wondered if I coult convert it back to a string inside the list to make the conversion to float/int later. So I tried by changing t.append(float(numb)) to t.append(str(numb)) which yielded as output:

        
              1.12005000
1.11800000
14574
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r']

For some reasons, some of this code actually works when run directly in an interpreter in a terminal. Example:

        
              >>> a = '1.12005000,1.11800000,14574' #copypasted from the source file
>>> a
'1.12005000,1.11800000,14574'
>>> a.split(',')
['1.12005000', '1.11800000', '14574']

But if I read the data from the data file:

        
              f = open('EU-1H.txt', 'r')
a = f.readline()
a = a[2:-1]
print a
print a.split(',')

gives when run:

        
              1.12005000,1.11800000,14574
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r\x00']

In interpreter:

        
              >>> f = open('EU-1H.txt', 'r')
>>> a = f.readline()
>>> a[2:-1]
'1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00,\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00,\x001\x004\x005\x007\x004\x00\r\x00'
>>> print a[2:-1]
1.12005000,1.11800000,14574
>>> a[2:-1].split(',')
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r\x00']
>>> print a[2:-1].split(',')
['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r\x00']

I have a lot of troubles understanding what's happening there.
My main goal is to just have something working (I expected to take no more than 10mn to write the storing algorithm and the data-retrieving one, apparently I couln't be more wrong, nothing went right), but I'd also like to understand what is going on obviously (so that I don't get stuck into this kind of issues again)
Any hints?
Thanks in advance.

Python 2.7.9 / Debian Jessie / i686

***micseydel*** · (This post was last modified: Sep-04-2017, 05:53 PM by micseydel.)

(Sep-04-2017, 05:06 PM)Krookroo Wrote: [...]code to get the values and get rid of the commas and EOL characters [...] My main goal is to just have something working

I haven't read your whole post, but if I'm understand correct, you're over-complicating things

        
              # -*- coding: latin-1 -*-
 
import re
 
sample = """��1.12005000,1.11800000,14574
1.11947000,1.11811000,8285
1.12035000,1.11749000,7979
1.11812000,1.11597000,18181
1.13148000,1.11499000,30360
1.13176000,1.12344000,57786
1.12441000,1.11997000,24175
1.12261000,1.12067000,14455
1.12466000,1.12198000,10255
1.12643000,1.12209000,29588
"""
 
print(re.findall(r"[\d.]+", sample))

If that solves your problem, and you still have other questions, it would help to break them down.

Output:$ python testit.py 
['1.12005000', '1.11800000', '14574', '1.11947000', '1.11811000', '8285', '1.12035000', '1.11749000', '7979', '1.11812000', '1.11597000', '18181', '1.13148000', '1.11499000', '30360', '1.13176000', '1.12344000', '57786', '1.12441000', '1.11997000', '24175', '1.12261000', '1.12067000', '14455', '1.12466000', '1.12198000', '10255', '1.12643000', '1.12209000', '29588']

**Larz60+** · Sep-04-2017, 09:00 PM

try this:

        
              import csv
 
def read_csv_data(filename):
    try:
        buffer = None
        with open(filename) as csvfile:
            buffer = csv.reader(csvfile, delimiter=',')
            for row in buffer:
                print(row)
    except csv.Error:
        print(f'Error reading {filename}')
 
if __name__ == '__main__':
    read_csv_data('sampleCSV.csv')

output:

Output:['1.12005000', '1.11800000', '14574']
['1.11947000', '1.11811000', '8285']
['1.12035000', '1.11749000', '7979']
['1.11812000', '1.11597000', '18181']
['1.13148000', '1.11499000', '30360']
['1.13176000', '1.12344000', '57786']
['1.12441000', '1.11997000', '24175']
['1.12261000', '1.12067000', '14455']
['1.12466000', '1.12198000', '10255']
['1.12643000', '1.12209000', '29588']

Krookroo · (This post was last modified: Sep-04-2017, 10:24 PM by Krookroo.)

Thanks for your input.
Larz60+'s format of output is exactly what I want, unfortunately:

        
              import csv
  
def read_csv_data(filename):
    try:
        buffer = None
        with open(filename) as csvfile:
            buffer = csv.reader(csvfile, delimiter=',')
            for row in buffer:
                print(row)
    except csv.Error:
        print('Error reading {filename}')
  
if __name__ == '__main__':
    read_csv_data('EU-1H.txt')

outputs:

        
              Error reading {filename}

My file is written through another program in another language, I decided to output in txt with a code that could be translated as:

        
              loop:
    f.write(str(value1)+","+str(value2)+","+str(value3))
f.close()

------------------------------------------------------------------------------
micseydel ' answer:
I have no experience with re at all, but unless I made a big mistake, your code should work when implemented this way (?):

        
              import re
f = open('EU-1H.txt', 'r')
print(re.findall(r"[\d.]+", line))

Unfortunately, it doesn't:

Output:  File "reader.py", line 4, in <module>
    print(re.findall(r"[\d.]+", f))
  File "/usr/lib/python2.7/re.py", line 181, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer

The only way to get an output is this

        
              import re
f = open('EU-1H.txt', 'r')
i = 0
for line in f:
    print(re.findall(r"[\d.]+", line))
    i += 1
    if i ==2: break

which outputs the ugly

Output:['1', '.', '1', '2', '0', '0', '5', '0', '0', '0', '1', '.', '1', '1', '8', '0', '0', '0', '0', '0', '1', '4', '5', '7', '4']
['1', '.', '1', '1', '9', '4', '7', '0', '0', '0', '1', '.', '1', '1', '8', '1', '1', '0', '0', '0', '8', '2', '8', '5']

I don't find how to attach my input file, a bit sorry there :(

Woops can't edit anymore, sorry for double post.
Interestingly, the representation of the data in an array or as a lone element is different, I don't understand at all what is going on:

        
              with open('EU-1H.txt', 'r') as f:
    a = f.readline()
    a = a[2:].split(',')
print a
for elem in a:
    print elem, type(elem)

outputs:

Output:['1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00', '\x001\x00.\x001\x001\x008\x000\x000\x000\x000\x000\x00', '\x001\x004\x005\x007\x004\x00\r\x00']
1.12005000 <type 'str'>
1.11800000 <type 'str'>
14574
 <type 'str'>

When trying float(elem) instead of type(elem) I get the following error

Error:1.12005000
Traceback (most recent call last):
  File "reader.py", line 8, in <module>
    print float(elem)
ValueError: invalid literal for float(): 1

Which led me to googling a bit and made me try the same above code with instead

        
              print elem, repre(elem)

and led to

Output:
1.12005000 '1\x00.\x001\x002\x000\x000\x005\x000\x000\x000\x00'

Note that in the representation, there is this curious "1" at the beginning. I tried to remove it with a[3:] in the above code, which led to:

        
              with open('EU-1H.txt', 'r') as f:
    a = f.readline()
    a = a[3:].split(',')
for elem in a:
    print elem

Output:.12005000
1.11800000
14574

So if I understand well, there is a mixing of representation of the characters in my file, which makes all methods fail. Is that true or is there a simpler explanation?
Is there a way to force the character encoding to something wherever it comes from?

I'm almost ready to alter my writing program to make it write a stupid line at the beginning just so that I can put it aside and then get my data...

**Larz60+** · Sep-04-2017, 10:28 PM

It expects the file to be in the same directory. If not, you can process the filename
into a fill path by:

add

1

import os

to top of script.
before you make the call to the read_csv_data function, process the file name with

1

filename = os.path.abspath('EU-1H.txt')
change the call to read_csv_data to

1

read_csv_data(filename)

If that doesn't work check your filename.
because the code I gave you was tested and works.

Krookroo · Sep-04-2017, 10:35 PM

        
              import csv
   
def read_csv_data(filename):
    try:
        buffer = None
        with open(filename) as csvfile:
            buffer = csv.reader(csvfile, delimiter=',')
            for row in buffer:
                print(row)
    except csv.Error:
        print('Error reading {filename}')
   
if __name__ == '__main__':
    read_csv_data('EU-1H.txt')

in terminal:

        
              dfuu@pupute:/media/dfuu/Data/mt4 exchange files$ python reader.py 
Error reading {filename}
dfuu@pupute:/media/dfuu/Data/mt4 exchange files$ ls
EU-1H.txt  reader.py  reader.py~

**Larz60+** · (This post was last modified: Sep-04-2017, 10:55 PM by Larz60+.)

i think your current working directory is going to not be where the reader.py is located.
you need to pass the full path.
so I think you need to set

        
              filename  = os.path.abspath('/media/dfuu/Data/mt4/EU-1H.txt')
read_csv_data(filename)

Krookroo · (This post was last modified: Sep-04-2017, 11:42 PM by Krookroo.)

        
              import csv
import os
 
filename  = os.path.abspath('/media/dfuu/Data/mt4/EU-1H.txt')
 
def read_csv_data(filename):
    try:
        buffer = None
        with open(filename) as csvfile:
            buffer = csv.reader(csvfile, delimiter=',')
            for row in buffer:
                print(row)
    except csv.Error:
        print('Error reading {filename}')
    
if __name__ == '__main__':
    read_csv_data(filename)

and terminal is:

        
              dfuu@pupute:/media/dfuu/Data/mt4$ python reader.py 
Error reading {filename}
dfuu@pupute:/media/dfuu/Data/mt4$ ls
EU-1H.txt  reader.py  reader.py~

I was expecting it wouldn't change anything as the directory I'm in with my terminal is the working directory in general. I'm really suspecting a problem with my input file which is not a true csv (I don't know what a "true csv" really is, I just described above how I wrote the file using another language, that other language being mql5 with which everything seems a pain to do... Unfortunately I don't have access to another language to extract this data. there is an option in it to write data to files in csv format but it was making ridiculous things and I also had that issue of the first few messed up characters, so I decided to make a txt output formated as csv).
I think it would be easier to attach the file so that you can see what it is to work with that.
I'm trying atm to work with the other lines and not the first one, but I can't find how to convert a single one to a float. This is getting ridiculous, I didn't expect to be stuck for 10 hours to write numbers in a file and retrieving them with python...

Edit: I just went to ask that other program to write data in "csv" format.

        
          
          
              
              1.12035 1.11749 7979
11812 1.11597 18181
13148 1.11499 30360
13176 1.12344 57786
12441 1.11997 24175
12261 1.12067 14455
12466 1.12198 10255
12643 1.12209 29588
12887 1.12095 33628
12243 1.11729 35058
11896 1.11737 12049
11971 1.11828 8376
11934 1.11714 9451
12193 1.11732 26088

            

        
      

Data looks like this. Your above code raises the same error.

Okay, last post in there, and the bottom line:

When I open the data file in emacs and copy/paste the content in a leafpad, then save it, all the propositions above works (mine included).
It appears my data file was encoded in iso435143625642562 which led to character representation unbearable apparently.
Apparently, other users of mql5 have stopped trying to open files for writing in utf8 (when I said it had the reputation of making everything become a pain...)
I'll try to see if there is a way in python to reencode this stuff properly. There should be.
Thanks a lot for your time and effort, it was greatly appreciated :)

DeaD_EyE · Sep-05-2017, 08:20 AM

Modify the code, to get more detailed output of the occurring error.
Without a detailed error message, it's hard to find an error.

        
              import csv
import os
  
filename  = os.path.abspath('/media/dfuu/Data/mt4/EU-1H.txt')
  
def read_csv_data(filename):
    try:
        buffer = None
        with open(filename) as csvfile:
            buffer = csv.reader(csvfile, delimiter=',')
            for row in buffer:
                print(row)
    except csv.Error as e:
        print('Error reading {filename}')
        print(e)
     
if __name__ == '__main__':
    read_csv_data(filename)

Maybe you've still problem with parsing the text.
In your first example it looks like a ByteOrderMark for utf-8.

You can ship around this problem, if you open the file with the right encoding.

        
              with open(filename, encoding='utf-8-sig') as csvfile:

But before you do thins, look first which error you get.
Otherwise it's guessing and guessing is not good in programming.

wavic · Sep-05-2017, 08:35 AM

Well, just strip() these ... things:

        
              >>> "��1.12005000,1.11800000,14574".strip("�")
'1.12005000,1.11800000,14574'

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Reading an ASCII text file and parsing data...	oradba4u	2	1,435	Jun-08-2024, 12:41 AM Last Post: oradba4u
	problems with reading csv file.	MassiJames	3	2,573	Nov-16-2023, 03:41 PM Last Post: snippsat
	When is it safe to compare (==) two floats?	Radical	4	2,538	Nov-12-2023, 11:53 AM Last Post: PyDan
	Reading a file name fron a folder on my desktop	Fiona	4	2,100	Aug-23-2023, 11:11 AM Last Post: Axel_Erfurt
	Reading data from excel file –> process it >>then write to another excel output file	Jennifer_Jone	0	2,104	Mar-14-2023, 07:59 PM Last Post: Jennifer_Jone
	Reading a file	JonWayn	3	1,953	Dec-30-2022, 10:18 AM Last Post: ibreeden
	Reading Specific Rows In a CSV File	finndude	3	1,853	Dec-13-2022, 03:19 PM Last Post: finndude
	Excel file reading problem	max70990	1	1,675	Dec-11-2022, 07:00 PM Last Post: deanhystad
	Replace columns indexes reading a XSLX file	Larry1888	2	1,705	Nov-18-2022, 10:16 PM Last Post: Pedroski55
	[split] why can't i create a list of numbers (ints) with random.randrange()	astral_travel	7	2,810	Oct-23-2022, 11:13 PM Last Post: Pedroski55

Reading floats and ints from csv-like file

User Panel Messages

Announcements