Problem with importing a CSV file

Problem with importing a CSV file - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Problem with importing a CSV file (/thread-22250.html)

Problem with importing a CSV file - Chopan2211 - Nov-05-2019

I have a problem importing a CSV file. I have tried with r' and singel / and dubbel \\ both ways. The same problem.
But I can easily import the same file in R-studio with

"df <- read.csv2(file="G:\\Analyser\\2019 OS\\test.csv",head=TRUE)"

Anybody have a clue what I´m doing wrong? I´m using Anaconda Spyder and I can also import the file directly.

import pandas as pd
pd.read_csv("G:\\Analyser\\2019 OS\\test.csv", header=None)

My error message is:

Traceback (most recent call last):

  File "<ipython-input-2-90f87efa86d2>", line 3, in <module>
    pd.read_csv("G:\\Analyser\\2019 OS\\test.csv", header=None)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 435, in _read
    data = parser.read(nrows)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1139, in read
    ret = self._engine.read(nrows)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1995, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 955, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2172, in pandas._libs.parsers.raise_parser_error

ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2

RE: Problem with importing a CSV file - buran - Nov-05-2019

how does your csv file looks like? especially line 4. can you provide let say top 10 lines

RE: Problem with importing a CSV file - Axel_Erfurt - Nov-05-2019

try

pd.read_csv("G:\\Analyser\\2019 OS\\test.csv", header=None, error_bad_lines=False)

RE: Problem with importing a CSV file - Chopan2211 - Nov-05-2019

[Image: view?usp=drivesdk]

I have tried but I still get an error message.

pd.read_csv("G:\\Analyser\\2019 OS\\test.csv", header=None, error_bad_lines=False)

the error message:

pd.read_csv(""G:\\Analyser\\2019 OS\\test.csv"", header=None, error_bad_lines=False)
b'Skipping line 4: expected 1 fields, saw 2\nSkipping line 7: expected 1 fields, saw 2\nSkipping line 44: expected 1 fields, saw 2\nSkipping line 47: expected 1 fields, saw 2\nSkipping line 48: expected 1 fields, saw 2\nSkipping line 62: expected 1 fields, saw 2\nSkipping line 68: expected 1 fields, saw 2\nSkipping line 83: expected 1 fields, saw 2\nSkipping line 98: expected 1 fields, saw 2\nSkipping line 101: expected 1 fields, saw 2\nSkipping line 103: expected 1 fields, saw 2\nSkipping line 181: expected 1 fields, saw 2\nSkipping line 245: expected 1 fields, saw 2\nSkipping line 314: expected 1 fields, saw 2\n'
Traceback (most recent call last):

  File "<ipython-input-1-aef5c76db53b>", line 3, in <module>
    pd.read_csv("G:\\Analyser\\2019 OS\\test.csv", header=None, error_bad_lines=False)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 435, in _read
    data = parser.read(nrows)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1139, in read
    ret = self._engine.read(nrows)

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1995, in read
    data = self._reader.read(nrows)

  File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read

  File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory

  File "pandas/_libs/parsers.pyx", line 991, in pandas._libs.parsers.TextReader._read_rows

  File "pandas/_libs/parsers.pyx", line 1123, in pandas._libs.parsers.TextReader._convert_column_data

  File "pandas/_libs/parsers.pyx", line 1176, in pandas._libs.parsers.TextReader._convert_tokens

  File "pandas/_libs/parsers.pyx", line 1299, in pandas._libs.parsers.TextReader._convert_with_dtype

  File "pandas/_libs/parsers.pyx", line 1315, in pandas._libs.parsers.TextReader._string_convert

  File "pandas/_libs/parsers.pyx", line 1553, in pandas._libs.parsers._string_box_utf8

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 95: invalid continuation byte

RE: Problem with importing a CSV file - buran - Nov-05-2019

Unfortunately that image does not help. You need to open the file in text editing program like Notepad or Notepad++
I understand these are confidential data (client LEI, transactions), so look at the data on lines 4, 7, 44, etc. - probably you have comma in some of the cells. If that is the case, at least this cell should be quoted. Is this the case?, e.g,

Output:id,name
1,"some, value"

as you can see here there are 2 columns, but line 2 has separator inside the value
it's possible that all values will be quoted

RE: Problem with importing a CSV file - Chopan2211 - Nov-06-2019

I made it to an excel file instead and that worked in Spyder. I do not know what the problem was because I could not see any strange cells or something similar. I easily imported the csv file in R-studio also. But anyways thank you all for the help.

RE: Problem with importing a CSV file - DeaD_EyE - Nov-06-2019

UnicodeDecodeError occurs, if the source file can't be decoded from utf8, which is the default encoding.
The function pd.read_csv does not seem to have a kwarg to ignore encoding errors.

One way could be to open the file in TextMode and pass the fd to pandas.

with open("G:\\Analyser\\2019 OS\\test.csv", errors='ignore' ) as fd:
    data = pd.read_csv(fd, header=None, error_bad_lines=False)

Take a look into the documentation about pd.read_csv.

This is the constructor:

 pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)[source]

The first argument filepath_or_buffer is described as:

Quote:filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any \

object with a read() method (such as a file handle or StringIO)

The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file://localhost/path/to/table.csv

I haven't tested the upper example, but it should work. In this case errors are ignored.
I guess the file you have, is in a different encoding as utf8.
It could be:

latin1 (ISO/IEC 8859-1)
latin9 (ISO/IEC 8859-15)
Windows-1252 (CP 1252 / (Western European) / ANSI)

There is also a modules called ftfy which can solve bad encoding errors.

import ftfy


with open('file_with_bad_encoding.txt', errors='ignore') src:
    fixed_text = ftfy.fix_text(src.read())
with open('file_with_fixed_encoding.txt', 'w') as dst:
    dst.write(fixed_text)

After this, the file is using utf8 as encoding and the most errors from wrong encoding/decoding should be fixed.
To know the right encoding of an input file is better.