Python Forum
selecting a particular column in csv file shows error - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: selecting a particular column in csv file shows error (/thread-8964.html)



selecting a particular column in csv file shows error - raady07 - Mar-15-2018

I am selecting a particular column from twitter username database which is in CSV file.
I tried the following with a simple csv file which is made by me. The code runs fine. But when I am reading the file with huge data it gives me error.

import csv

filename = 'twitter-gender-classifier.csv'
# filename = 'test.csv'

with open(filename) as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    data2 = []
    for row in readCSV:
        data = []
        data.append(row[14]) # appending names
        data.append(row[5])  # appending gender
        data2.append(data)

    print(data2)
The same code works fine with test.py file

with twitter-gender-classifier.py file it gives me the error
Error:
C:\Users\Dileep.Kumar\AppData\Local\Programs\Python\Python36\python.exe C:/Users/Dileep.Kumar/PycharmProjects/Twitter_Gender_Classification/test2.py Traceback (most recent call last): File "C:/Users/Dileep.Kumar/PycharmProjects/Twitter_Gender_Classification/test2.py", line 9, in <module> for row in readCSV: File "C:\Users\Dileep.Kumar\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1009: character maps to <undefined>



RE: selecting a particular column in csv file shows error - stranac - Mar-15-2018

Looks like the file is using an encoding different what your python uses by default.
You'll need to specify the encoding when opening the file: https://docs.python.org/3/library/functions.html#open


RE: selecting a particular column in csv file shows error - raady07 - Mar-15-2018

I have checked the encoding technqiue and it is utf8, so I have modified the code line as
with open(filename, encoding='utf8') as csvfile:
then it returns another error
Error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 927: invalid start byte



RE: selecting a particular column in csv file shows error - stranac - Mar-15-2018

Judging by that error, it isn't utf8.


RE: selecting a particular column in csv file shows error - raady07 - Mar-15-2018

when I check the file it gave me this

[Image: dda5b509725975e23d83cfae88c6ae2c.html]

Please help.


RE: selecting a particular column in csv file shows error - stranac - Mar-15-2018

Can you attach the file itself?
Would be much more useful than a broken link to download an image that shows something.

Also, how was the file created?


RE: selecting a particular column in csv file shows error - raady07 - Mar-15-2018

Issue is resolved.
A little modification helped by ignoring the encoding errors

with open(filename, encoding='utf8', errors='ignore') as csvfile:



RE: selecting a particular column in csv file shows error - Larz60+ - Mar-15-2018

Take a look at: https://pypi.python.org/pypi/chardet
this might be useful