Python Forum
ResourceExhaustedError: OOM
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ResourceExhaustedError: OOM
#1
Hello everyone,

i am training a very small model multiple times in a row. I know this is probably not what most people do, but i just want to try a lot of different combinations of parameters. I am just doing that by using a couple of for loops and training the model over and over again. I am using Tensorboard and a dictionary to write the data in a csv file.

Now i am getting this error:

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1000,1000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node training_492/SGD/gradients/dense_2194/MatMul_grad/MatMul_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
What i am wondering is how can that happen since the model is very small. I don't think the dictionary is getting to big, so i suppose every trained model is still saved in the memory. Even if i overwrite it in every for loop. Am i understanding that right or can anyone tell me the reason for that? Is there any smart solution for the problem?

Otherwise i am just splitting up the program on a bunch of smaller programs that train much less models each.

Regards
Marvin
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020