Python Forum
utf-8 error with pandas read_csv
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
utf-8 error with pandas read_csv
#1
I'm trying to read in several large data files (~600-700k rows) as dataframes so I can clean and append them to create a large panel dataset. When I'm importing, I get the following error
Error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 4: invalid continuation byte
When I restrict to nrows=5000, the read works, but somewhere between 5000 and 6000 rows, the error happens again. There isn't anything wrong with the file, and I've had no issues importing it, and the other files, into R. Here's the link to the publicly available .xslx file that I converted into a CSV before reading into Python: https://www.foreignlaborcert.doleta.gov/..._FY17.xlsx. Thanks in advance for your help in getting this issue resolved!

import pandas as pd
df_17 = pd.read_csv("C:\\Users\\bryanlm\\Python Projects\\Immigration\\LCA Dataset\\Aggregation\\17_H-1B_Disclosure_Data_FY17.csv")
Output:
df_17 = pd.read_csv("C:\\Users\\bryanlm\\Python Projects\\Immigration\\LCA Dataset\\Aggregation\\17_H-1B_Disclosure_Data_FY17.csv", nrows = 5900) Traceback (most recent call last): File "<ipython-input-4-c62aa366fb87>", line 1, in <module> df_17 = pd.read_csv("C:\\Users\\bryanlm\\Python Projects\\Immigration\\LCA Dataset\\Aggregation\\17_H-1B_Disclosure_Data_FY17.csv", nrows = 5900) File "C:\Users\bryanlm\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f return _read(filepath_or_buffer, kwds) File "C:\Users\bryanlm\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 446, in _read data = parser.read(nrows) File "C:\Users\bryanlm\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1036, in read ret = self._engine.read(nrows) File "C:\Users\bryanlm\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1848, in read data = self._reader.read(nrows) File "pandas\_libs\parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 903, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 1094, in pandas._libs.parsers.TextReader._convert_column_data File "pandas\_libs\parsers.pyx", line 1141, in pandas._libs.parsers.TextReader._convert_tokens File "pandas\_libs\parsers.pyx", line 1240, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas\_libs\parsers.pyx", line 1256, in pandas._libs.parsers.TextReader._string_convert File "pandas\_libs\parsers.pyx", line 1494, in pandas._libs.parsers._string_box_utf8 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 4: invalid continuation byte
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  strange error from pandas dataframe djf123 1 2,385 Jul-27-2020, 05:25 AM
Last Post: scidam
  pandas read_csv can't handle missing data mrdominikku 0 1,125 Jul-09-2020, 12:26 PM
Last Post: mrdominikku
  error bars with dataframe and pandas Hucky 4 2,141 Apr-27-2020, 02:02 AM
Last Post: Hucky
  pip install pandas ERROR pythondudu 6 15,940 Mar-26-2020, 07:49 PM
Last Post: snippsat
  pandas error Scott 2 4,174 Feb-05-2020, 07:22 PM
Last Post: Scott
  read_csv error and rows/columns missing karlito 9 2,692 Nov-11-2019, 06:48 AM
Last Post: karlito
  Checking a filename before reading it with pd.read_csv karlito 2 1,268 Oct-30-2019, 09:46 AM
Last Post: karlito
  pandas DataReader error on all data sources glidecode 5 14,262 Sep-25-2019, 02:10 PM
Last Post: perfringo
  Key error when using adodbapi, azure and pandas mazamus 2 2,390 Sep-11-2019, 06:54 AM
Last Post: TomKom
  pandas install error loren41 5 2,971 May-20-2019, 06:42 PM
Last Post: loren41

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020