![]() |
ResourceExhaustedError: OOM - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: ResourceExhaustedError: OOM (/thread-28355.html) |
ResourceExhaustedError: OOM - Marvin93 - Jul-15-2020 Hello everyone, i am training a very small model multiple times in a row. I know this is probably not what most people do, but i just want to try a lot of different combinations of parameters. I am just doing that by using a couple of for loops and training the model over and over again. I am using Tensorboard and a dictionary to write the data in a csv file. Now i am getting this error: tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1000,1000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node training_492/SGD/gradients/dense_2194/MatMul_grad/MatMul_1}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.What i am wondering is how can that happen since the model is very small. I don't think the dictionary is getting to big, so i suppose every trained model is still saved in the memory. Even if i overwrite it in every for loop. Am i understanding that right or can anyone tell me the reason for that? Is there any smart solution for the problem? Otherwise i am just splitting up the program on a bunch of smaller programs that train much less models each. Regards Marvin |