Python Forum

Full Version: selecting a particular column in csv file shows error
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am selecting a particular column from twitter username database which is in CSV file.
I tried the following with a simple csv file which is made by me. The code runs fine. But when I am reading the file with huge data it gives me error.

import csv

filename = 'twitter-gender-classifier.csv'
# filename = 'test.csv'

with open(filename) as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    data2 = []
    for row in readCSV:
        data = []
        data.append(row[14]) # appending names
        data.append(row[5])  # appending gender
        data2.append(data)

    print(data2)
The same code works fine with test.py file

with twitter-gender-classifier.py file it gives me the error
Error:
C:\Users\Dileep.Kumar\AppData\Local\Programs\Python\Python36\python.exe C:/Users/Dileep.Kumar/PycharmProjects/Twitter_Gender_Classification/test2.py Traceback (most recent call last): File "C:/Users/Dileep.Kumar/PycharmProjects/Twitter_Gender_Classification/test2.py", line 9, in <module> for row in readCSV: File "C:\Users\Dileep.Kumar\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1009: character maps to <undefined>
Looks like the file is using an encoding different what your python uses by default.
You'll need to specify the encoding when opening the file: https://docs.python.org/3/library/functions.html#open
I have checked the encoding technqiue and it is utf8, so I have modified the code line as
with open(filename, encoding='utf8') as csvfile:
then it returns another error
Error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 927: invalid start byte
Judging by that error, it isn't utf8.
when I check the file it gave me this

[Image: dda5b509725975e23d83cfae88c6ae2c.html]

Please help.
Can you attach the file itself?
Would be much more useful than a broken link to download an image that shows something.

Also, how was the file created?
Issue is resolved.
A little modification helped by ignoring the encoding errors

with open(filename, encoding='utf8', errors='ignore') as csvfile:
Take a look at: https://pypi.python.org/pypi/chardet
this might be useful