Jul-15-2020, 05:21 PM
Hello everyone,
i am training a very small model multiple times in a row. I know this is probably not what most people do, but i just want to try a lot of different combinations of parameters. I am just doing that by using a couple of for loops and training the model over and over again. I am using Tensorboard and a dictionary to write the data in a csv file.
Now i am getting this error:
Otherwise i am just splitting up the program on a bunch of smaller programs that train much less models each.
Regards
Marvin
i am training a very small model multiple times in a row. I know this is probably not what most people do, but i just want to try a lot of different combinations of parameters. I am just doing that by using a couple of for loops and training the model over and over again. I am using Tensorboard and a dictionary to write the data in a csv file.
Now i am getting this error:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1000,1000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node training_492/SGD/gradients/dense_2194/MatMul_grad/MatMul_1}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.What i am wondering is how can that happen since the model is very small. I don't think the dictionary is getting to big, so i suppose every trained model is still saved in the memory. Even if i overwrite it in every for loop. Am i understanding that right or can anyone tell me the reason for that? Is there any smart solution for the problem?
Otherwise i am just splitting up the program on a bunch of smaller programs that train much less models each.
Regards
Marvin