Python Forum
Loading .csv data using Pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Loading .csv data using Pandas
#1
I was trying to load a dataset from my local computer using pandas when I run the code I got these problems, please anyone help me :

#!/usr/bin/env python

'''
An example file to show how to use the feature-selection code in ml_lib
'''
import pandas
from tqdm import tqdm
import pandas as pd
import csv
from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn import svm

import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

import feature_select
import depmeas

if __name__=='__main__':
    NUM_CV = 3
    RANDOM_SEED = 123
    MAX_ITER = 1000

    leuk = pd.read_csv(r'C:/Users/pc/Desktop/dataset/leukemia.csv')
    X = leuk['data']
    y = leuk['target']
   
    # split the data for testing
    (X_train, X_test, y_train, y_test) = train_test_split(X, y, test_size=0.3, random_state=RANDOM_SEED)

    # perform feature selection
    num_features_to_select = 25
    K_MAX = 1000
    estimator = depmeas.mi_tau
    n_jobs = -1
    feature_ranking = feature_select.feature_select(X_train, y_train, num_features_to_select=num_features_to_select, K_MAX=K_MAX, estimator=estimator, n_jobs=n_jobs)
    num_selected_features = len(feature_ranking )
    # for each feature, compute the accuracy on the test data as we add features
    mean_acc = np.empty((num_selected_features,))
    var_acc  = np.empty((num_selected_features,))
    for ii in tqdm(range(num_selected_features), desc='Computing Classifier Performance...'):
        classifier = svm.SVC(random_state=RANDOM_SEED,max_iter=MAX_ITER)
        X_test_in = X_test[:,feature_ranking [0:ii+1]]
        scores = cross_val_score(classifier, X_test_in, y_test, cv=NUM_CV, n_jobs=-1)

        mu = scores.mean()
        sigma_sq = scores.std()
        
        mean_acc[ii] = mu
        var_acc[ii] = sigma_sq

    x = np.arange(num_selected_features)+1
    y = mean_acc
    yLo = mean_acc-var_acc/2.
    yHi = mean_acc+var_acc/2.
    
    plt.plot(x,y)
    plt.fill_between(x,yLo,yHi,alpha=0.2)
    plt.grid(True)
    plt.title('Leukemia Dataset Feature Selection\n Total # Features=%d' % (X.shape[1]))
    plt.xlabel('# Selected Features')
    plt.ylabel('SVC Classifier Accuracy')
    plt.show()
Error:
Traceback (most recent call last): File "C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\pandas\core\indexes\base.py", line 2657, in get_loc return self._engine.get_loc(key) File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'data' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:/Users/pc/PycharmProjects/MymrmrTest/feature_select_test.py", line 39, in <module> X = leuk['data'] File "C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__ indexer = self.columns.get_loc(key) File "C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'data' Process finished with exit code 1
Reply
#2
You should start by checking whether you have column named data in your dataframe. KeyError is probably about this row X = leuk['data']
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Parsing "aTimeLogger" Android app data to graphs using pandas Drone4four 8 3,031 Jun-23-2024, 07:12 AM
Last Post: Drone4four
  Grouping in pandas/multi-index data frame Aleqsie 3 2,186 Jan-06-2024, 03:55 PM
Last Post: deanhystad
  loading data astral_travel 7 7,239 Dec-12-2022, 03:29 PM
Last Post: astral_travel
Smile How to further boost the data read write speed using pandas tjk9501 1 1,969 Nov-14-2022, 01:46 PM
Last Post: jefsummers
Thumbs Up can't access data from URL in pandas/jupyter notebook aaanoushka 1 2,524 Feb-13-2022, 01:19 PM
Last Post: jefsummers
Question Sorting data with pandas TheZaind 4 3,267 Nov-22-2021, 07:33 PM
Last Post: aserian
  Pandas Data frame column condition check based on length of the value aditi06 1 3,669 Jul-28-2021, 11:08 AM
Last Post: jefsummers
  [Pandas] Write data to Excel with dot decimals manonB 1 7,881 May-05-2021, 05:28 PM
Last Post: ibreeden
  pandas.to_datetime: Combine data from 2 columns ju21878436312 1 3,454 Feb-20-2021, 08:25 PM
Last Post: perfringo
  pandas read_csv can't handle missing data mrdominikku 0 3,509 Jul-09-2020, 12:26 PM
Last Post: mrdominikku

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020