Python Forum
Reading floats and ints from csv-like file
Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Reading floats and ints from csv-like file
#11
@DeaD_EyE:
You think there is more to it than the fact that the file is written with mixed encodings making it ultra hard to process via python?
I spent one hour after my last post trying to force encodings and decodings, both in python and in the original program, with no success (the program that writes the file has an option to decide the encoding in which it opens the file to write in... which doesn't work and does nothing).

Quote:You can ship around this problem, if you open the file with the right encoding.
Python Code: (Double-click to select all)
1

with open(filename, encoding='utf-8-sig') as csvfile:
This didn't work for me, because the first byte is not compatible with that encoding so it raised an error. All the encoding/decoding was failing on the first byte of the file.

---------------------------------------------------------------------------------
Quote:Well, just strip() these ... things:

>>> "��1.12005000,1.11800000,14574".strip("�")
'1.12005000,1.11800000,14574'
How would I know these things will be the same for the other files that my script will generate(to be more precise: it's not)? If I have to manually adapt the stripping to each file, I'll just manually copy/paste the date into another file that I save in the right encoding. And if I know these will allways be the first two characters, I think my solution of working in the first line with
a = file.readline()
a = a[2:]
is more general. Even then, it seems it will allways be the first two characters but I can't be sure of that.
Reply
#12
using strip is probably safer
you can get the numeric value of the ? by using ord(character)
this one has a value of 65533 decimal or 'ffff0x' hex
Reply
#13
Didn't read the whole post, but the problem seems to be related to the encoding.

1- Try setting chcp 65001 in your console, which changes the code page to UTF-8. It could be that console encoding is different.

2- Try using codecs.
import codecs
with codecs.open("EU-1H.tx.txt",encoding='utf-8') as f:
....
....
Reply
#14
Puh, when the whole thing is so undefined, thats not easy to solve correctly.

I have the absolute brute-force method for it:

import string


def filter_text(text, allowed=string.digits + '.,'):
    return ''.join(filter(lambda c: c in allowed, text))

def process_line(line):
    return list(map(float, filter_text(line).split(',')))
    
    
if __name__ == '__main__':
    with open('test.csv', errors='replace') as fd:
        for line in fd:
            print(process_line(line))
Test this against your data. With this code it is even possible to parse data, when normal chars are between the numbers.
You should call it brute_fore_reader.py
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#15
As of Unicode 10, the black diamond question mark (hex fffd) is used "to replace an unknown, unrecognized or unrepresentable character".  Trying to decode it as utf-8 will fail, because utf-8 has already said it doesn't know what it is. One possible cause is that the characters do not lie in the Basic Multilingual Plane of 65424 code points.  Might we ask, what language the original file is written in?

It seems you should be able to test for those characters, and if they exist, split them out, if they don't, proceed as normal.
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
Reply
#16
Thanks for your inputs guys, lots of leads, I'll investigate into all this tomorrow
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Sad problems with reading csv file. MassiJames 3 632 Nov-16-2023, 03:41 PM
Last Post: snippsat
  When is it safe to compare (==) two floats? Radical 4 709 Nov-12-2023, 11:53 AM
Last Post: PyDan
  Reading a file name fron a folder on my desktop Fiona 4 916 Aug-23-2023, 11:11 AM
Last Post: Axel_Erfurt
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,102 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Reading a file JonWayn 3 1,095 Dec-30-2022, 10:18 AM
Last Post: ibreeden
  Reading Specific Rows In a CSV File finndude 3 989 Dec-13-2022, 03:19 PM
Last Post: finndude
  Excel file reading problem max70990 1 896 Dec-11-2022, 07:00 PM
Last Post: deanhystad
  Replace columns indexes reading a XSLX file Larry1888 2 989 Nov-18-2022, 10:16 PM
Last Post: Pedroski55
  [split] why can't i create a list of numbers (ints) with random.randrange() astral_travel 7 1,516 Oct-23-2022, 11:13 PM
Last Post: Pedroski55
  Failing reading a file and cannot exit it... tester_V 8 1,804 Aug-19-2022, 10:27 PM
Last Post: tester_V

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020