Message erro when use dataset .csv

Dalpi · May-23-2020, 03:30 PM

I'm doing some experiments using the classic Notebook Auto MPG Dataset and building a model to predict the fuel efficiency of late-1970s and early 1980s automobiles.
I did two codes:
One that download the Dataset from repository UCI Machine Learning (auto-mpg.data) and another that download the file im my computer (auto-mpg.csv).
Wher I run the program using the Dataset from UCI Machine Learning, it works well. When I run the program using the Dataset from my computer (auto-mpg.csv), it doesn't work.
Below can be seen both codes.
1. using Dataset from UCI Machine Learning

!pip install -q seaborn

from __future__ import absolute_import, division, print_function, unicode_literals

import pathlib

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)

dataset_path = tf.keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
dataset_path

column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset =pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

dataset = raw_dataset.copy()
dataset.tail(5)

dataset.isna().sum()

dataset= dataset.dropna()

origin = dataset.pop('Origin')

dataset['USA'] = (origin == 1)*1.0
dataset['Europe'] = (origin == 2)*1.0
dataset['Japan'] = (origin == 3)*1.0
dataset.tail()

train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)

sns.pairplot(train_dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")

train_stats = train_dataset.describe()
train_stats.pop("MPG")
train_stats = train_stats.transpose()
train_stats

train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')

def norm(x):
  return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)

When I use the Dataset from my computer:

dataset_path=pd.read_csv(r'C:\Users\ee0547\Documents\DISSERTAÇÃO DALPIAZ\EXEMPLOS DE REDES_22_05_2020\auto-mpg.csv')
dataset_path

column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset =pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

dataset = raw_dataset.copy()
dataset.tail(5)

Error:---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-3381f4da9f77> in <module>
      3 raw_dataset =pd.read_csv(dataset_path, names=column_names,
      4                       na_values = "?", comment='\t',
----> 5                       sep=" ", skipinitialspace=True)
      6 
      7 dataset = raw_dataset.copy()

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    674         )
    675 
--> 676         return _read(filepath_or_buffer, kwds)
    677 
    678     parser_f.__name__ = name

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    429     # See https://github.com/python/mypy/issues/1297
    430     fp_or_buf, _, compression, should_close = get_filepath_or_buffer(
--> 431         filepath_or_buffer, encoding, compression
    432     )
    433     kwds["compression"] = compression

~\Anaconda3\lib\site-packages\pandas\io\common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode)
    198     if not is_file_like(filepath_or_buffer):
    199         msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}"
--> 200         raise ValueError(msg)
    201 
    202     return filepath_or_buffer, None, compression, False

ValueError: Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'>

**scidam** · May-23-2020, 11:27 PM

I suspect this is because your file path is "too" complex; I didn't dive into problem, however, I would suggest to use "/" instead of "\" and simplify the path (remove spaces, specific symbols Ã), e.g. try with the filename like this "C:/Users/ee0547/Documents/problem1/your_file.csv".

Dalpi · May-24-2020, 02:37 PM

Thanks for you help.
I tried to simplify the path but it still didn't work.
I think the problem is related with the way that file is downloaded quando is used this code:

dataset_path = tf.keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
dataset_path

and when I use this code:

dataset_path=pd.read_csv(r'C:\Users\ee0547\Documents\auto-mpg.csv')
dataset_path

For boths codes everthing goes weel until reach this code:

column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset =pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" " , skipinitialspace=True)

dataset = raw_dataset.copy()
dataset.tail(5)

The first code runs without any error but the other code present error as was showed on the last post.

Message erro when use dataset .csv

User Panel Messages

Announcements