Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
load returns None
#1
Dear readers,

I have this piece of code I am trying to load in to save memory usage. Colab crashes due to memory shortage, somebody adviced to me use the save and load function. However I cant resolve the issue. According to a forum this should be how it is used. What is going wrong here?

Save function
np.save('el.npy', x)
x = np.save('el.npy', x)
Load function
np.load('el.npy', allow_pickle=True)
Which returns:
array(None, dtype=object)
Reply
#2
Generally speaking, questions should have sufficient information to reproduce the problem. I'm not familiar with the libraries you're using, but I would have tinkered with the code if I could reproduce the problem.
Feel like you're not getting the answers you want? Checkout the help/rules for things like what to include/not include in a post, how to use code tags, how to ask smart questions, and more.

Pro-tip - there's an inverse correlation between the number of lines of code posted and my enthusiasm for helping with a question :)
Reply
#3
Well, I dont know if its ok to post all of my code in here. It is a lot. I felt that would be a bit rude. And I am using a lot of libraries in it. It is all connected as well. So here goes.

%tensorflow_version 1.x
!wget https://raw.githubusercontent.com/prrao87/fine-grained-sentiment/master/data/sst/sst_train.txt
%%capture
!python -m spacy download en_core_web_md en
!pip install spacy
%%capture
!pip install chart_studio
import os
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
from sklearn import preprocessing
import spacy
from spacy.lang.en import English
from spacy import displacy
nlp = spacy.load('en')
from IPython.display import HTML
import logging
logging.getLogger('tensorflow').disabled = True #OPTIONAL - to disable outputs from Tensorflow
import pandas as pd
# Read train data
# read the large csv file with specified chunksize 
df = pd.read_csv('sst_train.txt', sep='\t', header=None, names=['polarity', 'sentence'])
df['polarity'] = df['polarity'].str.replace('__label__', '')
df['polarity'] = df['polarity'].astype(int)#.astype('category')
df = df[df.polarity != 3]
df['polarity'].replace(to_replace={1: 0, 2: 0, 4: 1, 5: 1}, inplace=True)
df['polarity'].value_counts()
train_df_t = df
# example of random undersampling to balance the class distribution
from collections import Counter
from sklearn.datasets import make_classification
from imblearn.under_sampling import RandomUnderSampler
# define undersample strategy
undersample = RandomUnderSampler(sampling_strategy='majority')
# fit and apply the transform
X_under, y_under = undersample.fit_resample(df['sentence'].to_numpy().reshape(-1, 1), df['polarity'].to_numpy())
# summarize class distribution
print(Counter(y_under))
import numpy as np
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_under, y_under, test_size=0.2, random_state=42, stratify = y_under)
print(Counter(y_train))
print(Counter(y_test))
def tensor_to_numpy(embeddings):
  %%time
  with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(tf.tables_initializer())
    x = sess.run(embeddings)
    
    return x
url_elmo = "https://tfhub.dev/google/elmo/3"
url_w2v = "https://tfhub.dev/google/Wiki-words-500/2"
embed_elmo = hub.Module(url_elmo)
embed_w2v = hub.load(url_w2v)
from tqdm import trange

# testing batched training
batch_size = 64
x_train_elmo = []
n = X_train.shape[0]

for fr in trange(0, n, batch_size):
  to = min(fr + batch_size, n)

  embeddings_train_elmo = embed_elmo(X_train.squeeze()[fr:to])
  res = tensor_to_numpy(embeddings_train_elmo)

  x_train_elmo.append(res)

x_train_elmo = np.concatenate(x_train_elmo, axis=0)
np.save('elmo.npy', x_train_elmo)
x_train_elmo = np.load('elmo.npy', allow_pickle=True)
x_train_elmo
x_train_elmo.shape
Reply
#4
Yeah, large amounts of code aren't helpful to us. What you should do instead is hard-code any "inputs" to the problem-code that come from "upstream" and provide a "minimal reproducible example". This is true not just on this forum, but on Stackoverflow, and really any other time someone is asking a question about code.
Feel like you're not getting the answers you want? Checkout the help/rules for things like what to include/not include in a post, how to use code tags, how to ask smart questions, and more.

Pro-tip - there's an inverse correlation between the number of lines of code posted and my enthusiasm for helping with a question :)
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020