Problem with importing a CSV file - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Problem with importing a CSV file (/thread-22250.html) |
Problem with importing a CSV file - Chopan2211 - Nov-05-2019 I have a problem importing a CSV file. I have tried with r' and singel / and dubbel \\ both ways. The same problem. But I can easily import the same file in R-studio with "df <- read.csv2(file="G:\\Analyser\\2019 OS\\test.csv",head=TRUE)" Anybody have a clue what I´m doing wrong? I´m using Anaconda Spyder and I can also import the file directly. import pandas as pd pd.read_csv("G:\\Analyser\\2019 OS\\test.csv", header=None)My error message is: Traceback (most recent call last): File "<ipython-input-2-90f87efa86d2>", line 3, in <module> pd.read_csv("G:\\Analyser\\2019 OS\\test.csv", header=None) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 435, in _read data = parser.read(nrows) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1139, in read ret = self._engine.read(nrows) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1995, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory File "pandas/_libs/parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 955, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas/_libs/parsers.pyx", line 2172, in pandas._libs.parsers.raise_parser_error ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 RE: Problem with importing a CSV file - buran - Nov-05-2019 how does your csv file looks like? especially line 4. can you provide let say top 10 lines RE: Problem with importing a CSV file - Axel_Erfurt - Nov-05-2019 try pd.read_csv("G:\\Analyser\\2019 OS\\test.csv", header=None, error_bad_lines=False) RE: Problem with importing a CSV file - Chopan2211 - Nov-05-2019 I have tried but I still get an error message. pd.read_csv("G:\\Analyser\\2019 OS\\test.csv", header=None, error_bad_lines=False)the error message: pd.read_csv(""G:\\Analyser\\2019 OS\\test.csv"", header=None, error_bad_lines=False) b'Skipping line 4: expected 1 fields, saw 2\nSkipping line 7: expected 1 fields, saw 2\nSkipping line 44: expected 1 fields, saw 2\nSkipping line 47: expected 1 fields, saw 2\nSkipping line 48: expected 1 fields, saw 2\nSkipping line 62: expected 1 fields, saw 2\nSkipping line 68: expected 1 fields, saw 2\nSkipping line 83: expected 1 fields, saw 2\nSkipping line 98: expected 1 fields, saw 2\nSkipping line 101: expected 1 fields, saw 2\nSkipping line 103: expected 1 fields, saw 2\nSkipping line 181: expected 1 fields, saw 2\nSkipping line 245: expected 1 fields, saw 2\nSkipping line 314: expected 1 fields, saw 2\n' Traceback (most recent call last): File "<ipython-input-1-aef5c76db53b>", line 3, in <module> pd.read_csv("G:\\Analyser\\2019 OS\\test.csv", header=None, error_bad_lines=False) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f return _read(filepath_or_buffer, kwds) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 435, in _read data = parser.read(nrows) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1139, in read ret = self._engine.read(nrows) File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1995, in read data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory File "pandas/_libs/parsers.pyx", line 991, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 1123, in pandas._libs.parsers.TextReader._convert_column_data File "pandas/_libs/parsers.pyx", line 1176, in pandas._libs.parsers.TextReader._convert_tokens File "pandas/_libs/parsers.pyx", line 1299, in pandas._libs.parsers.TextReader._convert_with_dtype File "pandas/_libs/parsers.pyx", line 1315, in pandas._libs.parsers.TextReader._string_convert File "pandas/_libs/parsers.pyx", line 1553, in pandas._libs.parsers._string_box_utf8 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 95: invalid continuation byte RE: Problem with importing a CSV file - buran - Nov-05-2019 Unfortunately that image does not help. You need to open the file in text editing program like Notepad or Notepad++ I understand these are confidential data (client LEI, transactions), so look at the data on lines 4, 7, 44, etc. - probably you have comma in some of the cells. If that is the case, at least this cell should be quoted. Is this the case?, e.g, as you can see here there are 2 columns, but line 2 has separator inside the valueit's possible that all values will be quoted RE: Problem with importing a CSV file - Chopan2211 - Nov-06-2019 I made it to an excel file instead and that worked in Spyder. I do not know what the problem was because I could not see any strange cells or something similar. I easily imported the csv file in R-studio also. But anyways thank you all for the help. RE: Problem with importing a CSV file - DeaD_EyE - Nov-06-2019 UnicodeDecodeError occurs, if the source file can't be decoded from utf8, which is the default encoding.The function pd.read_csv does not seem to have a kwarg to ignore encoding errors. One way could be to open the file in TextMode and pass the fd to pandas. with open("G:\\Analyser\\2019 OS\\test.csv", errors='ignore' ) as fd: data = pd.read_csv(fd, header=None, error_bad_lines=False)Take a look into the documentation about pd.read_csv. This is the constructor: pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)[source]The first argument filepath_or_buffer is described as:Quote:filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any \ I haven't tested the upper example, but it should work. In this case errors are ignored. I guess the file you have, is in a different encoding as utf8. It could be:
There is also a modules called ftfy which can solve bad encoding errors.import ftfy with open('file_with_bad_encoding.txt', errors='ignore') src: fixed_text = ftfy.fix_text(src.read()) with open('file_with_fixed_encoding.txt', 'w') as dst: dst.write(fixed_text)After this, the file is using utf8 as encoding and the most errors from wrong encoding/decoding should be fixed. To know the right encoding of an input file is better. |