Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
csv troubles
#1
Hi,

I downloaded a csv database with names and dates etc..
I can read it partially, line per line, split on ',' and get the values i want;
Except that some of the lines, especially the first name of the person, have strange chars.
I cannot even read past the first occurrence like this.
How can i read past these chars and not loose the rest of the line ?
1924,"DUPONT, Clément",FRA,Men,Rugby,Silver
f = open(file,'r')
for line in f:
    try:
        l = line[:-1]
        l = l.split(',')
    except:
        pass
f.close()
Output:
Traceback (most recent call last): File "E:/Python/Olympics2.py", line 13, in <module> for line in f: File "K:\Python383\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 6872: character maps to <undefined>
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#2
You are trying to decode the file with the codec cp1252. It is probably encoded in utf8 or iso8859-1. Use the encoding parameter of the open() function (or io.open, or codecs.open). You can also use the chardet command to guess the file's encoding.
Reply
#3
So I installed chardet (who invents these names ?), and i got the result:

Output:
{'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}
It is not a surprise it's an encoding problem, but i thought utf-8 is python's standard, for reading.
When i declare my file as such, i discover some wonderful first names like Désiré, and Frédéric...
Thanks for your help,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#4
You could perhaps use the utf8 mode, as described here
Reply
#5
(Aug-13-2020, 03:34 PM)Gribouillis Wrote: You could perhaps use the utf8 mode, as described here

You learn a new thing every day.
Some days, more than one.

Thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  troubles with tqdm max22 2 486 Nov-27-2023, 09:20 PM
Last Post: max22
  Installation troubles on Win 10 peringek 1 2,887 Dec-31-2020, 07:30 AM
Last Post: caleb_cruze
  converting user input to float troubles RecklessTechGuy 3 2,406 Aug-17-2020, 12:41 PM
Last Post: deanhystad
  Win32API Troubles daaaabs 2 2,896 Mar-24-2020, 08:19 PM
Last Post: daaaabs
  local variable troubles yokaso 4 3,160 Oct-20-2019, 05:25 PM
Last Post: ichabod801
  Troubles with instaling pocketsphinx Thais781 1 2,602 Aug-07-2018, 10:05 AM
Last Post: Larz60+
  Troubles with classes, taken from a book sylas 2 3,158 Jun-05-2017, 08:39 AM
Last Post: sylas
  [?] UTF8, Unicode and Binary data reading troubles doublezero 1 3,134 Mar-31-2017, 11:32 PM
Last Post: Ofnuts

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020