![]() |
Using Autoencoder for Data Augmentation of numerical Dataset in Python - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Using Autoencoder for Data Augmentation of numerical Dataset in Python (/thread-28137.html) |
Using Autoencoder for Data Augmentation of numerical Dataset in Python - Marvin93 - Jul-06-2020 Hello everyone, i was coding an Autoencoder. The plan is to try to use it for Data Augmentation of a numerical Dataset. I know it might not work properly, but i want to try at least. So i found an example of a code for the MNIST Dataset online and tried to adjust it to my numerical Dataset. The results are extremely bad and i don't realy know how to start to improve it. The loss seems to be pretty weird to me. I am still a beginner of machine learning and coding and this is my first autoencoder. I kind of expect that the generating of the new data does not work properly or just strengthen the average in best case, but i want at least the autoencoder itself work properly. Can anyone help me to make it work? My code is: import numpy as np import pandas as pd import matplotlib.pyplot as plt from keras import models,layers from keras import applications from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split embedding_dim = 5 #Input layer input_data = layers.Input(shape=(27,)) #Encoding layer encoded = layers.Dense(embedding_dim, activation='relu')(input_data) #Decoding layer decoded = layers.Dense(27,activation='sigmoid')(encoded) autoencoder = models.Model(input_data,decoded) autoencoder.summary() #Encoder encoder = models.Model(input_data,encoded) encoder.summary() #Decoder encoded_input = layers.Input(shape=(embedding_dim,)) decoder_layers = autoencoder.layers[-1] #applying the last layer decoder = models.Model(encoded_input,decoder_layers(encoded_input)) print(input_data) print(encoded) autoencoder.compile( optimizer='adadelta', loss='binary_crossentropy' ) data = pd.read_csv("C:/Users/...", header=0) feature_spalten = [...] x = data [feature_spalten] sc = StandardScaler() x = sc.fit_transform(x) x_train, x_test = train_test_split(x, train_size=0.8, test_size = 0.2) print(x_train.shape,x_test.shape) history = autoencoder.fit(x_train,x_train,epochs=50,batch_size=10,shuffle=True, validation_data=(x_test,x_test)) plt.plot(history.history['loss'],label='loss') plt.plot(history.history['val_loss'],label='val_loss') plt.legend() plt.show() plt.close() encoded_imgs = encoder.predict(x_test) decoded_imgs = decoder.predict(encoded_imgs) print(x_test[1]) print(encoded_imgs[1]) print(decoded_imgs[1]) Epoch 1/50 666/666 [==============================] - 0s 319us/step - loss: 0.6085 - val_loss: 0.4956 Epoch 2/50 666/666 [==============================] - 0s 112us/step - loss: -0.1682 - val_loss: -0.8901 Epoch 3/50 666/666 [==============================] - 0s 111us/step - loss: -5.9375 - val_loss: -10.3128 Epoch 4/50 666/666 [==============================] - 0s 121us/step - loss: -41.6272 - val_loss: -63.0064 Epoch 5/50 666/666 [==============================] - 0s 121us/step - loss: -212.9176 - val_loss: -290.3859 Epoch 6/50 666/666 [==============================] - 0s 124us/step - loss: -848.1797 - val_loss: -1062.8757 Epoch 7/50 666/666 [==============================] - 0s 114us/step - loss: -2863.6273 - val_loss: -3405.0128 Epoch 8/50 666/666 [==============================] - 0s 134us/step - loss: -8527.4483 - val_loss: -9693.6851 Epoch 9/50 666/666 [==============================] - 0s 107us/step - loss: -22757.1002 - val_loss: -24704.1394 Epoch 10/50 666/666 [==============================] - 0s 124us/step - loss: -55154.4632 - val_loss: -57120.8508 Epoch 11/50 666/666 [==============================] - 0s 119us/step - loss: -122513.5889 - val_loss: -122792.2679 Epoch 12/50 666/666 [==============================] - 0s 175us/step - loss: -251940.6556 - val_loss: -243550.9253 Epoch 13/50 666/666 [==============================] - 0s 138us/step - loss: -483814.7004 - val_loss: -456335.7720 Epoch 14/50 666/666 [==============================] - 0s 118us/step - loss: -879216.8864 - val_loss: -809321.0520 Epoch 15/50 666/666 [==============================] - 0s 112us/step - loss: -1525024.4664 - val_loss: -1372953.5451 Epoch 16/50 666/666 [==============================] - 0s 121us/step - loss: -2531758.5153 - val_loss: -2241380.7342 Epoch 17/50 666/666 [==============================] - 0s 139us/step - loss: -4059733.6131 - val_loss: -3533760.1696 Epoch 18/50 666/666 [==============================] - 0s 111us/step - loss: -6241526.3448 - val_loss: -5315857.7776 Epoch 19/50 666/666 [==============================] - 0s 124us/step - loss: -9271648.9710 - val_loss: -7795804.1207 Epoch 20/50 666/666 [==============================] - 0s 117us/step - loss: -13382943.2812 - val_loss: -11118669.4349 Epoch 21/50 666/666 [==============================] - 0s 115us/step - loss: -18866546.7188 - val_loss: -15493272.2056 Epoch 22/50 666/666 [==============================] - 0s 112us/step - loss: -26015345.1170 - val_loss: -21135213.2486 Epoch 23/50 666/666 [==============================] - 0s 114us/step - loss: -35157265.7533 - val_loss: -28333500.4743 Epoch 24/50 666/666 [==============================] - 0s 126us/step - loss: -46649575.3033 - val_loss: -37229013.6085 Epoch 25/50 666/666 [==============================] - 0s 111us/step - loss: -60915819.1827 - val_loss: -48325016.3341 Epoch 26/50 666/666 [==============================] - 0s 120us/step - loss: -78454795.1451 - val_loss: -61818716.5065 Epoch 27/50 666/666 [==============================] - 0s 136us/step - loss: -99496161.2260 - val_loss: -77740755.5057 Epoch 28/50 666/666 [==============================] - 0s 121us/step - loss: -124530331.4595 - val_loss: -96700581.5273 Epoch 29/50 666/666 [==============================] - 0s 136us/step - loss: -154009247.6106 - val_loss: -118998226.7941 Epoch 30/50 666/666 [==============================] - 0s 115us/step - loss: -188228319.0390 - val_loss: -144467681.6220 Epoch 31/50 666/666 [==============================] - 0s 121us/step - loss: -227326350.8589 - val_loss: -173524223.5126 Epoch 32/50 666/666 [==============================] - 0s 132us/step - loss: -271506365.9110 - val_loss: -206106330.4663 Epoch 33/50 666/666 [==============================] - 0s 130us/step - loss: -320878682.9309 - val_loss: -242436692.1731 Epoch 34/50 666/666 [==============================] - 0s 123us/step - loss: -375914161.0013 - val_loss: -282676327.0760 Epoch 35/50 666/666 [==============================] - 0s 129us/step - loss: -436207667.8823 - val_loss: -326130269.9530 Epoch 36/50 666/666 [==============================] - 0s 131us/step - loss: -501011729.8258 - val_loss: -373403217.1961 Epoch 37/50 666/666 [==============================] - 0s 140us/step - loss: -571414574.2252 - val_loss: -424185912.5576 Epoch 38/50 666/666 [==============================] - 0s 134us/step - loss: -647325289.2252 - val_loss: -479097395.8665 Epoch 39/50 666/666 [==============================] - 0s 136us/step - loss: -729058020.4386 - val_loss: -537883819.2931 Epoch 40/50 666/666 [==============================] - 0s 151us/step - loss: -815980009.9711 - val_loss: -600199536.3999 Epoch 41/50 666/666 [==============================] - 0s 210us/step - loss: -907683117.3093 - val_loss: -665806459.6033 Epoch 42/50 666/666 [==============================] - 0s 142us/step - loss: -1004339026.0079 - val_loss: -734108616.1733 Epoch 43/50 666/666 [==============================] - 0s 153us/step - loss: -1104456387.0238 - val_loss: -805104765.1138 Epoch 44/50 666/666 [==============================] - 0s 132us/step - loss: -1208102874.6092 - val_loss: -878959172.9569 Epoch 45/50 666/666 [==============================] - 0s 156us/step - loss: -1316799990.3904 - val_loss: -955931473.1196 Epoch 46/50 666/666 [==============================] - 0s 130us/step - loss: -1428856534.9670 - val_loss: -1035392761.0133 Epoch 47/50 666/666 [==============================] - 0s 127us/step - loss: -1544756835.5037 - val_loss: -1116145906.3891 Epoch 48/50 666/666 [==============================] - 0s 175us/step - loss: -1661658106.3890 - val_loss: -1197924988.1799 Epoch 49/50 666/666 [==============================] - 0s 124us/step - loss: -1780243449.8799 - val_loss: -1281141478.4630 Epoch 50/50 666/666 [==============================] - 0s 141us/step - loss: -1900183853.8796 - val_loss: -1365599635.7473 [ 0.03959699 -0.99163165 -0.75003885 -0.25651321 0.18778244 -1.00187989 1.00413377 -0.98694688 0.41052097 -1.44455093 -1.98743372 -0.93995741 0.02268564 0.02345748 -0.69441083 0.177332 -0.12558926 -0.60611171 0. 0.24519559 0.13795889 -0.33991206 0.23129338 0.55440042 -0.26234019 0.10895555 1.48348411] [ 0. 255.93634 141.89441 0. 0. ] [1.5810438e-03 0.0000000e+00 7.0749174e-06 0.0000000e+00 4.0274216e-07 1.7075544e-02 7.3435016e-02 2.9222088e-06 3.9918396e-01 0.0000000e+00 0.0000000e+00 0.0000000e+00 1.3559917e-02 9.7694965e-03 0.0000000e+00 1.0237135e-02 4.2446300e-01 1.1339870e-03 0.0000000e+00 9.2203043e-02 1.7725935e-02 5.6416985e-02 6.1349845e-01 9.4470326e-03 2.8716803e-01 0.0000000e+00 9.2083311e-01] RE: Using Autoencoder for Data Augmentation of numerical Dataset in Python - hussainmujtaba - Jul-10-2020 You should use the loss function 'sparse_categorical_crossentropy' instead of 'binary cross-entropy' as MNIST has more categories than 2. For a guide, you can take look at this article about auto-encoders RE: Using Autoencoder for Data Augmentation of numerical Dataset in Python - Marvin93 - Jul-10-2020 (Jul-10-2020, 06:47 AM)hussainmujtaba Wrote: You should use the loss function 'sparse_categorical_crossentropy' instead of 'binary cross-entropy' as MNIST has more categories than 2. Hey, thanks. I will take a look at the article. But i am not using the MNIST Dataset. I am using my own numerical Dataset from a CSV file. And that has only one class. Actually it has no label at all. As far as i know Autoencoders don't need that because they just encode and decode the data. But yeah as i said i don't realy know where to start to make it run. And in the example they use the binary_crossentropy aswell. |