Using Autoencoder for Data Augmentation of numerical Dataset in Python

Using Autoencoder for Data Augmentation of numerical Dataset in Python - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Using Autoencoder for Data Augmentation of numerical Dataset in Python (/thread-28137.html)

Using Autoencoder for Data Augmentation of numerical Dataset in Python - Marvin93 - Jul-06-2020

Hello everyone,

i was coding an Autoencoder. The plan is to try to use it for Data Augmentation of a numerical Dataset. I know it might not work properly, but i want to try at least.
So i found an example of a code for the MNIST Dataset online and tried to adjust it to my numerical Dataset. The results are extremely bad and i don't realy know how to start to improve it.
The loss seems to be pretty weird to me. I am still a beginner of machine learning and coding and this is my first autoencoder. I kind of expect that the generating of the new data does not work properly or just strengthen the average in best case, but i want at least the autoencoder itself work properly.

Can anyone help me to make it work?

My code is:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras import models,layers
from keras import applications
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

embedding_dim = 5

#Input layer
input_data = layers.Input(shape=(27,))  

#Encoding layer
encoded = layers.Dense(embedding_dim, activation='relu')(input_data)

#Decoding layer
decoded = layers.Dense(27,activation='sigmoid')(encoded)

autoencoder = models.Model(input_data,decoded)
autoencoder.summary()

#Encoder
encoder = models.Model(input_data,encoded)
encoder.summary()

#Decoder
encoded_input = layers.Input(shape=(embedding_dim,))
decoder_layers = autoencoder.layers[-1]  #applying the last layer
decoder = models.Model(encoded_input,decoder_layers(encoded_input))

print(input_data)
print(encoded)

autoencoder.compile(
    optimizer='adadelta',
    loss='binary_crossentropy'
)

data = pd.read_csv("C:/Users/...", header=0)
feature_spalten = [...]

x = data [feature_spalten]

sc = StandardScaler()
x = sc.fit_transform(x)

x_train, x_test = train_test_split(x, train_size=0.8, test_size = 0.2)

print(x_train.shape,x_test.shape)

history = autoencoder.fit(x_train,x_train,epochs=50,batch_size=10,shuffle=True,
                validation_data=(x_test,x_test))

plt.plot(history.history['loss'],label='loss')
plt.plot(history.history['val_loss'],label='val_loss')
plt.legend()
plt.show()
plt.close()

encoded_imgs = encoder.predict(x_test) 
decoded_imgs = decoder.predict(encoded_imgs)
print(x_test[1])
print(encoded_imgs[1])
print(decoded_imgs[1])


Epoch 1/50
666/666 [==============================] - 0s 319us/step - loss: 0.6085 - val_loss: 0.4956
Epoch 2/50
666/666 [==============================] - 0s 112us/step - loss: -0.1682 - val_loss: -0.8901
Epoch 3/50
666/666 [==============================] - 0s 111us/step - loss: -5.9375 - val_loss: -10.3128
Epoch 4/50
666/666 [==============================] - 0s 121us/step - loss: -41.6272 - val_loss: -63.0064
Epoch 5/50
666/666 [==============================] - 0s 121us/step - loss: -212.9176 - val_loss: -290.3859
Epoch 6/50
666/666 [==============================] - 0s 124us/step - loss: -848.1797 - val_loss: -1062.8757
Epoch 7/50
666/666 [==============================] - 0s 114us/step - loss: -2863.6273 - val_loss: -3405.0128
Epoch 8/50
666/666 [==============================] - 0s 134us/step - loss: -8527.4483 - val_loss: -9693.6851
Epoch 9/50
666/666 [==============================] - 0s 107us/step - loss: -22757.1002 - val_loss: -24704.1394
Epoch 10/50
666/666 [==============================] - 0s 124us/step - loss: -55154.4632 - val_loss: -57120.8508
Epoch 11/50
666/666 [==============================] - 0s 119us/step - loss: -122513.5889 - val_loss: -122792.2679
Epoch 12/50
666/666 [==============================] - 0s 175us/step - loss: -251940.6556 - val_loss: -243550.9253
Epoch 13/50
666/666 [==============================] - 0s 138us/step - loss: -483814.7004 - val_loss: -456335.7720
Epoch 14/50
666/666 [==============================] - 0s 118us/step - loss: -879216.8864 - val_loss: -809321.0520
Epoch 15/50
666/666 [==============================] - 0s 112us/step - loss: -1525024.4664 - val_loss: -1372953.5451
Epoch 16/50
666/666 [==============================] - 0s 121us/step - loss: -2531758.5153 - val_loss: -2241380.7342
Epoch 17/50
666/666 [==============================] - 0s 139us/step - loss: -4059733.6131 - val_loss: -3533760.1696
Epoch 18/50
666/666 [==============================] - 0s 111us/step - loss: -6241526.3448 - val_loss: -5315857.7776
Epoch 19/50
666/666 [==============================] - 0s 124us/step - loss: -9271648.9710 - val_loss: -7795804.1207
Epoch 20/50
666/666 [==============================] - 0s 117us/step - loss: -13382943.2812 - val_loss: -11118669.4349
Epoch 21/50
666/666 [==============================] - 0s 115us/step - loss: -18866546.7188 - val_loss: -15493272.2056
Epoch 22/50
666/666 [==============================] - 0s 112us/step - loss: -26015345.1170 - val_loss: -21135213.2486
Epoch 23/50
666/666 [==============================] - 0s 114us/step - loss: -35157265.7533 - val_loss: -28333500.4743
Epoch 24/50
666/666 [==============================] - 0s 126us/step - loss: -46649575.3033 - val_loss: -37229013.6085
Epoch 25/50
666/666 [==============================] - 0s 111us/step - loss: -60915819.1827 - val_loss: -48325016.3341
Epoch 26/50
666/666 [==============================] - 0s 120us/step - loss: -78454795.1451 - val_loss: -61818716.5065
Epoch 27/50
666/666 [==============================] - 0s 136us/step - loss: -99496161.2260 - val_loss: -77740755.5057
Epoch 28/50
666/666 [==============================] - 0s 121us/step - loss: -124530331.4595 - val_loss: -96700581.5273
Epoch 29/50
666/666 [==============================] - 0s 136us/step - loss: -154009247.6106 - val_loss: -118998226.7941
Epoch 30/50
666/666 [==============================] - 0s 115us/step - loss: -188228319.0390 - val_loss: -144467681.6220
Epoch 31/50
666/666 [==============================] - 0s 121us/step - loss: -227326350.8589 - val_loss: -173524223.5126
Epoch 32/50
666/666 [==============================] - 0s 132us/step - loss: -271506365.9110 - val_loss: -206106330.4663
Epoch 33/50
666/666 [==============================] - 0s 130us/step - loss: -320878682.9309 - val_loss: -242436692.1731
Epoch 34/50
666/666 [==============================] - 0s 123us/step - loss: -375914161.0013 - val_loss: -282676327.0760
Epoch 35/50
666/666 [==============================] - 0s 129us/step - loss: -436207667.8823 - val_loss: -326130269.9530
Epoch 36/50
666/666 [==============================] - 0s 131us/step - loss: -501011729.8258 - val_loss: -373403217.1961
Epoch 37/50
666/666 [==============================] - 0s 140us/step - loss: -571414574.2252 - val_loss: -424185912.5576
Epoch 38/50
666/666 [==============================] - 0s 134us/step - loss: -647325289.2252 - val_loss: -479097395.8665
Epoch 39/50
666/666 [==============================] - 0s 136us/step - loss: -729058020.4386 - val_loss: -537883819.2931
Epoch 40/50
666/666 [==============================] - 0s 151us/step - loss: -815980009.9711 - val_loss: -600199536.3999
Epoch 41/50
666/666 [==============================] - 0s 210us/step - loss: -907683117.3093 - val_loss: -665806459.6033
Epoch 42/50
666/666 [==============================] - 0s 142us/step - loss: -1004339026.0079 - val_loss: -734108616.1733
Epoch 43/50
666/666 [==============================] - 0s 153us/step - loss: -1104456387.0238 - val_loss: -805104765.1138
Epoch 44/50
666/666 [==============================] - 0s 132us/step - loss: -1208102874.6092 - val_loss: -878959172.9569
Epoch 45/50
666/666 [==============================] - 0s 156us/step - loss: -1316799990.3904 - val_loss: -955931473.1196
Epoch 46/50
666/666 [==============================] - 0s 130us/step - loss: -1428856534.9670 - val_loss: -1035392761.0133
Epoch 47/50
666/666 [==============================] - 0s 127us/step - loss: -1544756835.5037 - val_loss: -1116145906.3891
Epoch 48/50
666/666 [==============================] - 0s 175us/step - loss: -1661658106.3890 - val_loss: -1197924988.1799
Epoch 49/50
666/666 [==============================] - 0s 124us/step - loss: -1780243449.8799 - val_loss: -1281141478.4630
Epoch 50/50
666/666 [==============================] - 0s 141us/step - loss: -1900183853.8796 - val_loss: -1365599635.7473
[ 0.03959699 -0.99163165 -0.75003885 -0.25651321  0.18778244 -1.00187989
  1.00413377 -0.98694688  0.41052097 -1.44455093 -1.98743372 -0.93995741
  0.02268564  0.02345748 -0.69441083  0.177332   -0.12558926 -0.60611171
  0.          0.24519559  0.13795889 -0.33991206  0.23129338  0.55440042
 -0.26234019  0.10895555  1.48348411]
[  0.      255.93634 141.89441   0.        0.     ]
[1.5810438e-03 0.0000000e+00 7.0749174e-06 0.0000000e+00 4.0274216e-07
 1.7075544e-02 7.3435016e-02 2.9222088e-06 3.9918396e-01 0.0000000e+00
 0.0000000e+00 0.0000000e+00 1.3559917e-02 9.7694965e-03 0.0000000e+00
 1.0237135e-02 4.2446300e-01 1.1339870e-03 0.0000000e+00 9.2203043e-02
 1.7725935e-02 5.6416985e-02 6.1349845e-01 9.4470326e-03 2.8716803e-01
 0.0000000e+00 9.2083311e-01]

RE: Using Autoencoder for Data Augmentation of numerical Dataset in Python - hussainmujtaba - Jul-10-2020

You should use the loss function 'sparse_categorical_crossentropy' instead of 'binary cross-entropy' as MNIST has more categories than 2.
For a guide, you can take look at this article about auto-encoders

RE: Using Autoencoder for Data Augmentation of numerical Dataset in Python - Marvin93 - Jul-10-2020

(Jul-10-2020, 06:47 AM)hussainmujtaba Wrote: You should use the loss function 'sparse_categorical_crossentropy' instead of 'binary cross-entropy' as MNIST has more categories than 2.
For a guide, you can take look at this article about auto-encoders

Hey, thanks. I will take a look at the article.

But i am not using the MNIST Dataset. I am using my own numerical Dataset from a CSV file. And that has only one class. Actually it has no label at all. As far as i know Autoencoders don't need that because they just encode and decode the data.
But yeah as i said i don't realy know where to start to make it run. And in the example they use the binary_crossentropy aswell.