Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Fetching data from Sklearn
#1
I am a python beginner and using Pycharm as IDE and Python 3.7.3. I am using an existing code when I run the code gives me an error
the code is

from tqdm import tqdm

from sklearn.datasets import fetch_mldata
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn import svm
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

import depmeas
import feature_select

if __name__ == '__main__':
    NUM_CV = 3
    RANDOM_SEED = 123
    MAX_ITER = 1000

    leuk = fetch_mldata('leukemia', transpose_data=True)
    X = leuk['data']
    y = leuk['target']

    # split the data for testing
    (X_train, X_test, y_train, y_test) = train_test_split(X, y, test_size=0.3, random_state=RANDOM_SEED)

    # perform feature selection
    num_features_to_select = 25
    K_MAX = 1000
    estimator = depmeas.mi_tau
    n_jobs = -1
    feature_ranking_idxs = feature_select.feature_select(X_train, y_train,
                                                         num_features_to_select=num_features_to_select, K_MAX=K_MAX,
                                                         estimator=estimator, n_jobs=n_jobs)
    num_selected_features = len(feature_ranking_idxs)
    # for each feature, compute the accuracy on the test data as we add features
    mean_acc = np.empty((num_selected_features,))
    std_acc = np.empty((num_selected_features,))
    for ii in tqdm(range(num_selected_features), desc='Computing Classifier Performance...'):
        classifier = svm.SVC(random_state=RANDOM_SEED, max_iter=MAX_ITER)
        X_test_in = X_test[:, feature_ranking_idxs[0:ii + 1]]
        scores = cross_val_score(classifier, X_test_in, y_test, cv=NUM_CV, n_jobs=-1)

        mu = scores.mean()
        sigma_sq = scores.std()

        mean_acc[ii] = mu
        std_acc[ii] = sigma_sq

    x = np.arange(num_selected_features) + 1
    y = mean_acc
    yLo = mean_acc - std_acc / 2.
    yHi = mean_acc + std_acc / 2.

    plt.plot(x, y)
    plt.fill_between(x, yLo, yHi, alpha=0.2)
    plt.grid(True)
    plt.title('Leukemia Dataset Feature Selection\n Total # Features=%d' % (X.shape[1]))
    plt.xlabel('# Selected Features')
    plt.ylabel('SVC Classifier Accuracy')
    plt.show()

Error:
C:\Users\pc\PycharmProjects\MymrmrTest\venv\Scripts\python.exe C:/Users/pc/PycharmProjects/MymrmrTest/feature_selection_test.py C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\sklearn\externals\joblib\__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+. warnings.warn(msg, category=DeprecationWarning) C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\sklearn\utils\deprecation.py:85: DeprecationWarning: Function fetch_mldata is deprecated; fetch_mldata was deprecated in version 0.20 and will be removed in version 0.22. Please use fetch_openml. warnings.warn(msg, category=DeprecationWarning) C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\sklearn\utils\deprecation.py:85: DeprecationWarning: Function mldata_filename is deprecated; mldata_filename was deprecated in version 0.20 and will be removed in version 0.22. Please use fetch_openml. warnings.warn(msg, category=DeprecationWarning) Traceback (most recent call last): File "C:/Users/pc/PycharmProjects/MymrmrTest/feature_selection_test.py", line 34, in <module> leuk = fetch_mldata('leukemia', transpose_data=True) File "C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\sklearn\utils\deprecation.py", line 86, in wrapped return fun(*args, **kwargs) File "C:\Users\pc\PycharmProjects\MymrmrTest\venv\lib\site-packages\sklearn\datasets\mldata.py", line 126, in fetch_mldata mldata_url = urlopen(urlname) File "C:\Users\pc\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "C:\Users\pc\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open response = meth(req, response) File "C:\Users\pc\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\pc\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 569, in error return self._call_chain(*args) File "C:\Users\pc\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain result = func(*args) File "C:\Users\pc\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 649, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 502: Bad Gateway Process finished with exit code 1
Quote
#2
Executing

fetch_mldata('leukemia', transpose_data=True)
I got the following error:

Error:
http.client.RemoteDisconnected: Remote end closed connection without response
Probably something wrong with the data source?!
Quote
#3
As of version 0.20, sklearn deprecates fetch_mldata function and adds fetch_openml instead.

fetch_mldata() is deprecated
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Error while fetching data from PostgreSQL linu 3 651 May-13-2019, 02:38 PM
Last Post: rxndy
  Error while fetching data from PostgreSQL tuple indices must be integers or slices, n Sandy777 6 1,142 May-12-2019, 11:41 AM
Last Post: Sandy777
  Need help to correct my python function for fetching full data! PrateekG 2 849 May-27-2018, 06:39 AM
Last Post: PrateekG
  Fetching html files from local directories shiva 3 1,031 Mar-20-2018, 05:12 PM
Last Post: wavic
  Rebuilding sklearn/tree package inside anaconda3 koukou 0 766 Feb-07-2018, 01:50 PM
Last Post: koukou

Forum Jump:


Users browsing this thread: 1 Guest(s)