Python Forum
csv troubles - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: csv troubles (/thread-28997.html)



csv troubles - DPaul - Aug-13-2020

Hi,

I downloaded a csv database with names and dates etc..
I can read it partially, line per line, split on ',' and get the values i want;
Except that some of the lines, especially the first name of the person, have strange chars.
I cannot even read past the first occurrence like this.
How can i read past these chars and not loose the rest of the line ?
1924,"DUPONT, Clément",FRA,Men,Rugby,Silver
f = open(file,'r')
for line in f:
    try:
        l = line[:-1]
        l = l.split(',')
    except:
        pass
f.close()
Output:
Traceback (most recent call last): File "E:/Python/Olympics2.py", line 13, in <module> for line in f: File "K:\Python383\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6872: character maps to <undefined>



RE: csv troubles - Gribouillis - Aug-13-2020

You are trying to decode the file with the codec cp1252. It is probably encoded in utf8 or iso8859-1. Use the encoding parameter of the open() function (or io.open, or codecs.open). You can also use the chardet command to guess the file's encoding.


RE: csv troubles - DPaul - Aug-13-2020

So I installed chardet (who invents these names ?), and i got the result:

Output:
{'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}
It is not a surprise it's an encoding problem, but i thought utf-8 is python's standard, for reading.
When i declare my file as such, i discover some wonderful first names like Désiré, and Frédéric...
Thanks for your help,
Paul


RE: csv troubles - Gribouillis - Aug-13-2020

You could perhaps use the utf8 mode, as described here


RE: csv troubles - DPaul - Aug-13-2020

(Aug-13-2020, 03:34 PM)Gribouillis Wrote: You could perhaps use the utf8 mode, as described here

You learn a new thing every day.
Some days, more than one.

Thx,
Paul