Python Forum
Loading .csv data using Pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Loading .csv data using Pandas
#1
I was trying to load a dataset from my local computer using pandas when I run the code I got these problems, please anyone help me :

#!/usr/bin/env python

'''
An example file to show how to use the feature-selection code in ml_lib
'''
import pandas
from tqdm import tqdm
import pandas as pd
import csv
from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn import svm

import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

import feature_select
import depmeas

if __name__=='__main__':
    NUM_CV = 3
    RANDOM_SEED = 123
    MAX_ITER = 1000

    leuk = pd.read_csv(r'C:/Users/pc/Desktop/dataset/leukemia.csv')
    X = leuk['data']
    y = leuk['target']
   
    # split the data for testing
    (X_train, X_test, y_train, y_test) = train_test_split(X, y, test_size=0.3, random_state=RANDOM_SEED)

    # perform feature selection
    num_features_to_select = 25
    K_MAX = 1000
    estimator = depmeas.mi_tau
    n_jobs = -1
    feature_ranking = feature_select.feature_select(X_train, y_train, num_features_to_select=num_features_to_select, K_MAX=K_MAX, estimator=estimator, n_jobs=n_jobs)
    num_selected_features = len(feature_ranking )
    # for each feature, compute the accuracy on the test data as we add features
    mean_acc = np.empty((num_selected_features,))
    var_acc  = np.empty((num_selected_features,))
    for ii in tqdm(range(num_selected_features), desc='Computing Classifier Performance...'):
        classifier = svm.SVC(random_state=RANDOM_SEED,max_iter=MAX_ITER)
        X_test_in = X_test[:,feature_ranking [0:ii+1]]
        scores = cross_val_score(classifier, X_test_in, y_test, cv=NUM_CV, n_jobs=-1)

        mu = scores.mean()
        sigma_sq = scores.std()
        
        mean_acc[ii] = mu
        var_acc[ii] = sigma_sq

    x = np.arange(num_selected_features)+1
    y = mean_acc
    yLo = mean_acc-var_acc/2.
    yHi = mean_acc+var_acc/2.
    
    plt.plot(x,y)
    plt.fill_between(x,yLo,yHi,alpha=0.2)
    plt.grid(True)
    plt.title('Leukemia Dataset Feature Selection\n Total # Features=%d' % (X.shape[1]))
    plt.xlabel('# Selected Features')
    plt.ylabel('SVC Classifier Accuracy')
    plt.show()
Error:
Traceback (most recent call last): File "C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\pandas\core\indexes\base.py", line 2657, in get_loc return self._engine.get_loc(key) File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'data' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:/Users/pc/PycharmProjects/MymrmrTest/feature_select_test.py", line 39, in <module> X = leuk['data'] File "C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__ indexer = self.columns.get_loc(key) File "C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas\_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas\_libs\hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas\_libs\hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'data' Process finished with exit code 1
Reply
#2
You should start by checking whether you have column named data in your dataframe. KeyError is probably about this row X = leuk['data']
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Grouping in pandas/multi-index data frame Aleqsie 3 606 Jan-06-2024, 03:55 PM
Last Post: deanhystad
  loading data astral_travel 7 3,634 Dec-12-2022, 03:29 PM
Last Post: astral_travel
Smile How to further boost the data read write speed using pandas tjk9501 1 1,227 Nov-14-2022, 01:46 PM
Last Post: jefsummers
Thumbs Up can't access data from URL in pandas/jupyter notebook aaanoushka 1 1,830 Feb-13-2022, 01:19 PM
Last Post: jefsummers
Question Sorting data with pandas TheZaind 4 2,295 Nov-22-2021, 07:33 PM
Last Post: aserian
  Pandas Data frame column condition check based on length of the value aditi06 1 2,655 Jul-28-2021, 11:08 AM
Last Post: jefsummers
  [Pandas] Write data to Excel with dot decimals manonB 1 5,773 May-05-2021, 05:28 PM
Last Post: ibreeden
  pandas.to_datetime: Combine data from 2 columns ju21878436312 1 2,418 Feb-20-2021, 08:25 PM
Last Post: perfringo
  pandas read_csv can't handle missing data mrdominikku 0 2,459 Jul-09-2020, 12:26 PM
Last Post: mrdominikku
  Pandas data frame creation from Kafka Topic vboppa 0 1,913 Jul-01-2020, 04:23 PM
Last Post: vboppa

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020