Python Forum
Message erro when use dataset .csv
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Message erro when use dataset .csv
#1
I'm doing some experiments using the classic Notebook Auto MPG Dataset and building a model to predict the fuel efficiency of late-1970s and early 1980s automobiles.
I did two codes:
One that download the Dataset from repository UCI Machine Learning (auto-mpg.data) and another that download the file im my computer (auto-mpg.csv).
Wher I run the program using the Dataset from UCI Machine Learning, it works well. When I run the program using the Dataset from my computer (auto-mpg.csv), it doesn't work.
Below can be seen both codes.
1. using Dataset from UCI Machine Learning
!pip install -q seaborn
from __future__ import absolute_import, division, print_function, unicode_literals

import pathlib

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)
dataset_path = tf.keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
dataset_path
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset =pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

dataset = raw_dataset.copy()
dataset.tail(5)
dataset.isna().sum()
dataset= dataset.dropna()
origin = dataset.pop('Origin')
dataset['USA'] = (origin == 1)*1.0
dataset['Europe'] = (origin == 2)*1.0
dataset['Japan'] = (origin == 3)*1.0
dataset.tail()
train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)
sns.pairplot(train_dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")
train_stats = train_dataset.describe()
train_stats.pop("MPG")
train_stats = train_stats.transpose()
train_stats
train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')
def norm(x):
  return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)
When I use the Dataset from my computer:
dataset_path=pd.read_csv(r'C:\Users\ee0547\Documents\DISSERTAÇÃO DALPIAZ\EXEMPLOS DE REDES_22_05_2020\auto-mpg.csv')
dataset_path
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset =pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

dataset = raw_dataset.copy()
dataset.tail(5)
Error:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-6-3381f4da9f77> in <module> 3 raw_dataset =pd.read_csv(dataset_path, names=column_names, 4 na_values = "?", comment='\t', ----> 5 sep=" ", skipinitialspace=True) 6 7 dataset = raw_dataset.copy() ~\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision) 674 ) 675 --> 676 return _read(filepath_or_buffer, kwds) 677 678 parser_f.__name__ = name ~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds) 429 # See https://github.com/python/mypy/issues/1297 430 fp_or_buf, _, compression, should_close = get_filepath_or_buffer( --> 431 filepath_or_buffer, encoding, compression 432 ) 433 kwds["compression"] = compression ~\Anaconda3\lib\site-packages\pandas\io\common.py in get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode) 198 if not is_file_like(filepath_or_buffer): 199 msg = f"Invalid file path or buffer object type: {type(filepath_or_buffer)}" --> 200 raise ValueError(msg) 201 202 return filepath_or_buffer, None, compression, False ValueError: Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'>
Reply
#2
I suspect this is because your file path is "too" complex; I didn't dive into problem, however, I would suggest to use "/" instead of "\" and simplify the path (remove spaces, specific symbols Ã), e.g. try with the filename like this "C:/Users/ee0547/Documents/problem1/your_file.csv".
Reply
#3
Thanks for you help.
I tried to simplify the path but it still didn't work.
I think the problem is related with the way that file is downloaded quando is used this code:
dataset_path = tf.keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
dataset_path
and when I use this code:
dataset_path=pd.read_csv(r'C:\Users\ee0547\Documents\auto-mpg.csv')
dataset_path
For boths codes everthing goes weel until reach this code:

column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset =pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" " , skipinitialspace=True)

dataset = raw_dataset.copy()
dataset.tail(5)
The first code runs without any error but the other code present error as was showed on the last post.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020