Reading floats and ints from csv-like file

Reading floats and ints from csv-like file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Reading floats and ints from csv-like file (/thread-4717.html)

Pages: 1 2

RE: Reading floats and ints from csv-like file - Krookroo - Sep-05-2017

@DeaD_EyE:
You think there is more to it than the fact that the file is written with mixed encodings making it ultra hard to process via python?
I spent one hour after my last post trying to force encodings and decodings, both in python and in the original program, with no success (the program that writes the file has an option to decide the encoding in which it opens the file to write in... which doesn't work and does nothing).

Quote:You can ship around this problem, if you open the file with the right encoding.
Python Code: (Double-click to select all)
1

with open(filename, encoding='utf-8-sig') as csvfile:

This didn't work for me, because the first byte is not compatible with that encoding so it raised an error. All the encoding/decoding was failing on the first byte of the file.

---------------------------------------------------------------------------------

Quote:Well, just strip() these ... things:
>>> "��1.12005000,1.11800000,14574".strip("�")
'1.12005000,1.11800000,14574'

How would I know these things will be the same for the other files that my script will generate(to be more precise: it's not)? If I have to manually adapt the stripping to each file, I'll just manually copy/paste the date into another file that I save in the right encoding. And if I know these will allways be the first two characters, I think my solution of working in the first line with

a = file.readline()
a = a[2:]

is more general. Even then, it seems it will allways be the first two characters but I can't be sure of that.

RE: Reading floats and ints from csv-like file - Larz60+ - Sep-05-2017

using strip is probably safer
you can get the numeric value of the ? by using ord(character)
this one has a value of 65533 decimal or 'ffff0x' hex

RE: Reading floats and ints from csv-like file - hbknjr - Sep-05-2017

Didn't read the whole post, but the problem seems to be related to the encoding.

1- Try setting chcp 65001 in your console, which changes the code page to UTF-8. It could be that console encoding is different.

2- Try using codecs.

import codecs
with codecs.open("EU-1H.tx.txt",encoding='utf-8') as f:
....
....

RE: Reading floats and ints from csv-like file - DeaD_EyE - Sep-05-2017

Puh, when the whole thing is so undefined, thats not easy to solve correctly.

I have the absolute brute-force method for it:

import string


def filter_text(text, allowed=string.digits + '.,'):
    return ''.join(filter(lambda c: c in allowed, text))

def process_line(line):
    return list(map(float, filter_text(line).split(',')))
    
    
if __name__ == '__main__':
    with open('test.csv', errors='replace') as fd:
        for line in fd:
            print(process_line(line))

Test this against your data. With this code it is even possible to parse data, when normal chars are between the numbers.
You should call it brute_fore_reader.py

RE: Reading floats and ints from csv-like file - sparkz_alot - Sep-05-2017

As of Unicode 10, the black diamond question mark (hex fffd) is used "to replace an unknown, unrecognized or unrepresentable character". Trying to decode it as utf-8 will fail, because utf-8 has already said it doesn't know what it is. One possible cause is that the characters do not lie in the Basic Multilingual Plane of 65424 code points. Might we ask, what language the original file is written in?

It seems you should be able to test for those characters, and if they exist, split them out, if they don't, proceed as normal.

RE: Reading floats and ints from csv-like file - Krookroo - Sep-05-2017

Thanks for your inputs guys, lots of leads, I'll investigate into all this tomorrow