Python Forum
utf-8 error with pandas read_csv
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
utf-8 error with pandas read_csv
#1
I'm trying to read in several large data files (~600-700k rows) as dataframes so I can clean and append them to create a large panel dataset. When I'm importing, I get the following error
Error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 4: invalid continuation byte
When I restrict to nrows=5000, the read works, but somewhere between 5000 and 6000 rows, the error happens again. There isn't anything wrong with the file, and I've had no issues importing it, and the other files, into R. Here's the link to the publicly available .xslx file that I converted into a CSV before reading into Python: https://www.foreignlaborcert.doleta.gov/..._FY17.xlsx. Thanks in advance for your help in getting this issue resolved!

import pandas as pd
df_17 = pd.read_csv("C:\\Users\\bryanlm\\Python Projects\\Immigration\\LCA Dataset\\Aggregation\\17_H-1B_Disclosure_Data_FY17.csv")
Output:
df_17 = pd.read_csv("C:\\Users\\bryanlm\\Python Projects\\Immigration\\LCA Dataset\\Aggregation\\17_H-1B_Disclosure_Data_FY17.csv", nrows = 5900) Traceback (most recent call last): File "<ipython-input-4-c62aa366fb87>", line 1, in <module> df_17 = pd.read_csv("C:\\Users\\bryanlm\\Python Projects\\Immigration\\LCA Dataset\\Aggregation\\17_H-1B_Disclosure_Data_FY17.csv", nrows = 5900) File "C:\Users\bryanlm\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f return _read(filepath_or_buffer, kwds) File "C:\Users\bryanlm\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 446, in _read data = parser.read(nrows) File "C:\Users\bryanlm\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1036, in read ret = self._engine.read(nrows) File "C:\Users\bryanlm\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1848, in read data = self._reader.read(nrows) File "pandas\_libs\parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read File "pandas\_libs\parsers.pyx", line 903, in pandas._libs.parsers.TextReader._read_low_memory File "pandas\_libs\parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows File "pandas\_libs\parsers.pyx", line 1094, in pandas._libs.parsers.TextReader._convert_column_data File "pandas\_libs\parsers.pyx", line 1141, in pandas._libs.parsers.TextReader._convert_tokens File "pandas\_libs\parsers.pyx", line 1240, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas\_libs\parsers.pyx", line 1256, in pandas._libs.parsers.TextReader._string_convert File "pandas\_libs\parsers.pyx", line 1494, in pandas._libs.parsers._string_box_utf8 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 4: invalid continuation byte
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  pip install pandas ERROR pythondudu 7 32,931 May-24-2022, 06:14 AM
Last Post: Marya_475
  Pandas read_csv markf7319 0 1,231 Mar-03-2022, 04:59 AM
Last Post: markf7319
  strange error from pandas dataframe djf123 1 4,002 Jul-27-2020, 05:25 AM
Last Post: scidam
  pandas read_csv can't handle missing data mrdominikku 0 2,460 Jul-09-2020, 12:26 PM
Last Post: mrdominikku
  error bars with dataframe and pandas Hucky 4 4,147 Apr-27-2020, 02:02 AM
Last Post: Hucky
  pandas error Scott 2 6,178 Feb-05-2020, 07:22 PM
Last Post: Scott
  read_csv error and rows/columns missing karlito 9 5,229 Nov-11-2019, 06:48 AM
Last Post: karlito
  Checking a filename before reading it with pd.read_csv karlito 2 2,178 Oct-30-2019, 09:46 AM
Last Post: karlito
  Key error when using adodbapi, azure and pandas mazamus 2 3,892 Sep-11-2019, 06:54 AM
Last Post: TomKom
  pandas install error loren41 5 4,824 May-20-2019, 06:42 PM
Last Post: loren41

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020