Jan-27-2024, 06:26 AM
I have been working on a GAN model that is designed to output smiles sequences. During my training step I have been getting the error "Value Error: No gradients provided for any variable", i am getting this problem with the gan_model for the trainable variables of the generator.
I have implemented my loss function using tf.op function (tf.reduce_mean) and also checked if it is differentiable. I still run into the no gradients provided for any variable error. My generator uses an attention mechanism and I have checked at my level if any functions in my generator are non-differentiable or are having any incorrect input.
The wasserstein loss function returns plausible values and output shapes seem correct
Here is the relevant code i used for defining the loss function and the generator.
LOSS FUNCTION
I have implemented my loss function using tf.op function (tf.reduce_mean) and also checked if it is differentiable. I still run into the no gradients provided for any variable error. My generator uses an attention mechanism and I have checked at my level if any functions in my generator are non-differentiable or are having any incorrect input.
The wasserstein loss function returns plausible values and output shapes seem correct
Here is the relevant code i used for defining the loss function and the generator.
LOSS FUNCTION
def wasserstein_loss(y_true, y_pred): loss = tf.reduce_mean(y_true * y_pred) return loss y_true = tf.constant([1.0, -1.0, 1.0], dtype=tf.float32) y_pred = tf.constant([0.5, -0.5, 0.2], dtype=tf.float32) with tf.GradientTape() as tape: tape.watch(y_pred) loss = wasserstein_loss(y_true, y_pred) gradient = tape.gradient(loss, y_pred) print("Loss:", loss.numpy()) print("Gradient:", gradient.numpy())GENERATOR
def generator(latent_dim, num_protein_tokens, num_smiles_tokens, max_protein_seq_length, max_smiles_length): init_hidden_state = Input(shape=(max_smiles_length,), name='s0') init_cell_state = Input(shape=(max_smiles_length,), name='c0') hidden_state = init_hidden_state cell_state = init_cell_state input_latent = Input(shape=(max_smiles_length, latent_dim,), name='input_latent') input_protein = Input(shape=(max_protein_seq_length,), name='input_protein') embedding_protein = Embedding(num_protein_tokens, 25, mask_zero=True, input_length=max_protein_seq_length)(input_protein) lstm_protein = Bidirectional(LSTM(75, return_sequences=True))(embedding_protein) lstm_combined_outputs =[] for t in range(max_smiles_length): context = one_step_attention(lstm_protein, hidden_state) lstm_combined, hidden_state, cell_state = post_activation_LSTM_cell(inputs=context, initial_state=[hidden_state, cell_state]) lstm_combined_outputs.append(lstm_combined) lstm_combined_outputs = Concatenate(axis=1)(lstm_combined_outputs) concat_layer = Concatenate(axis=2)([input_latent, lstm_combined_outputs]) generated_smiles_array = TimeDistributed(Dense(num_smiles_tokens, activation='softmax', name='output_smiles'))(concat_layer) output_smiles = softargmax(generated_smiles_array, beta=1e10) generator_model = Model(inputs=[input_protein, init_hidden_state, init_cell_state, input_latent], outputs=output_smiles, name='generator') return generator_model generator_model = generator(latent_dim, num_protein_tokens, num_smiles_tokens, max_protein_seq_length, max_smiles_length) gen_optimizer = RMSprop(learning_rate=0.00005) generator_model.compile(loss=wasserstein_loss, optimizer=gen_optimizer)I have defined a seperate attention mechanism which uses repeat_vector followed by 2 dense layers, softmax and dot product. I am willing to share more code on my training loop and the attention.