Python Forum
ResourceExhaustedError: OOM - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: ResourceExhaustedError: OOM (/thread-28355.html)



ResourceExhaustedError: OOM - Marvin93 - Jul-15-2020

Hello everyone,

i am training a very small model multiple times in a row. I know this is probably not what most people do, but i just want to try a lot of different combinations of parameters. I am just doing that by using a couple of for loops and training the model over and over again. I am using Tensorboard and a dictionary to write the data in a csv file.

Now i am getting this error:

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1000,1000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node training_492/SGD/gradients/dense_2194/MatMul_grad/MatMul_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
What i am wondering is how can that happen since the model is very small. I don't think the dictionary is getting to big, so i suppose every trained model is still saved in the memory. Even if i overwrite it in every for loop. Am i understanding that right or can anyone tell me the reason for that? Is there any smart solution for the problem?

Otherwise i am just splitting up the program on a bunch of smaller programs that train much less models each.

Regards
Marvin