I manage a dataset of AMEX, NASDAQ and AMEX stocks over last 40 years. I encountered a memory error shown as follows:
File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 1379, in take_nd
out = np.empty(out_shape, dtype=dtype)
MemoryError
Many thanks for helping.
Post the full error traceback message in error tags, and if possible also relevant piece of code in Python code tags. You can find
help here.
If I reduce the sample length to 10 years, the python code works well without any memory error.
Here is the tracking record:
Error:
Traceback (most recent call last):
File "<ipython-input-1-4d852114e435>", line 1, in <module>
runfile('C:/Users/lokac/Desktop/Data/Momentum wrds.py', wdir='C:/Users/lokac/Desktop/Data')
File "C:\Users\lokac\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\Users\lokac\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/lokac/Desktop/Data/Momentum wrds.py", line 89, in <module>
port = pd.merge(_tmp_ret, umd, on=['permno'], how='inner')
File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 58, in merge
return op.get_result()
File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 596, in get_result
concat_axis=0, copy=self.copy)
File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\internals.py", line 5203, in concatenate_block_managers
concatenate_join_units(join_units, concat_axis, copy=copy),
File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\internals.py", line 5332, in concatenate_join_units
for ju in join_units]
File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\internals.py", line 5332, in <listcomp>
for ju in join_units]
File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\internals.py", line 5632, in get_reindexed_values
fill_value=fill_value)
File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 1379, in take_nd
out = np.empty(out_shape, dtype=dtype)
MemoryError
You are obviously running out of memory.
pandas
is a memory hog - see
this article. Quoting the author
Quote:my rule of thumb for pandas is that you should have 5 to 10 times as much RAM as the size of your dataset
You probably should find a way to split your data into chunks and process it in smaller portions - or increase the amount of available RAM
Iterating is what you need. You chunk the data into parts, that it fits into your memory. Then you processing them, saving the results and proceed with the next chunk.
It's better to use a generator. It's super memory efficient.