Python Forum

Full Version: Memory Error
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I manage a dataset of AMEX, NASDAQ and AMEX stocks over last 40 years. I encountered a memory error shown as follows:
File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 1379, in take_nd
out = np.empty(out_shape, dtype=dtype)

MemoryError

Many thanks for helping.
Post the full error traceback message in error tags, and if possible also relevant piece of code in Python code tags. You can find help here.
If I reduce the sample length to 10 years, the python code works well without any memory error.
Here is the tracking record:
Error:
Traceback (most recent call last): File "<ipython-input-1-4d852114e435>", line 1, in <module> runfile('C:/Users/lokac/Desktop/Data/Momentum wrds.py', wdir='C:/Users/lokac/Desktop/Data') File "C:\Users\lokac\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace) File "C:\Users\lokac\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/lokac/Desktop/Data/Momentum wrds.py", line 89, in <module> port = pd.merge(_tmp_ret, umd, on=['permno'], how='inner') File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 58, in merge return op.get_result() File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 596, in get_result concat_axis=0, copy=self.copy) File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\internals.py", line 5203, in concatenate_block_managers concatenate_join_units(join_units, concat_axis, copy=copy), File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\internals.py", line 5332, in concatenate_join_units for ju in join_units] File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\internals.py", line 5332, in <listcomp> for ju in join_units] File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\internals.py", line 5632, in get_reindexed_values fill_value=fill_value) File "C:\Users\lokac\Anaconda3\lib\site-packages\pandas\core\algorithms.py", line 1379, in take_nd out = np.empty(out_shape, dtype=dtype) MemoryError
You are obviously running out of memory.

pandas is a memory hog - see this article. Quoting the author

Quote:my rule of thumb for pandas is that you should have 5 to 10 times as much RAM as the size of your dataset

You probably should find a way to split your data into chunks and process it in smaller portions - or increase the amount of available RAM
Iterating is what you need. You chunk the data into parts, that it fits into your memory. Then you processing them, saving the results and proceed with the next chunk.
It's better to use a generator. It's super memory efficient.