Solved – Training Autoencoder with Softmax Layer

autoencodersneural networks

I have read several tutorials on training an autoencoder that can then be combined with a classification layer (e.g. softmax layer) for learning a classifier model.

I am trying to build this in keras but I am getting blocked on the part where I take my trained encoder layer that learned a compressed mapping of my input data matrix $X$ and I now combine it with a softmax layer to learn the classification model. How do I train this part? Should I re-use the training set I used to train the autoencoder part only this time I fit on the class labels?

EDIT:
To clarify, these are the steps I am taking to train a basic stacked autoencoder with a labeled training and test set:

  1. Take full training set and train it on itself, with 1 hidden layer and 1 output layer such that the hidden has less neurons than the input features.
  2. Once the hidden layer in step 1 is trained, it is taken out and connected with a new softmax layer. Here, the training set is used again but is fitted with its labels that were not previously used.
  3. Following step 2, I can use the model for making predictions on my test set.

Is this correct?

Best Answer

I don't know why this was downvoted, but I figured out the answer though it may be obvious.

The training set is used to train one compression/encoder layer by learning to approximate itself using the training set.

Once this is done, the weights / layer that is responsible for the encoding part is saved and paired with a classification layer (e.g. softmax layer) to learn a supervised classifier. This is done by using the same training set as before and fitting them with labels / classes of this training set that weren't used previously.

After the classifier is trained, it can be used to make predictions or check performance using the test set.

For example, if you already had an autoencoder trained and wanted to use the encoding layer with a softmax layer, you could do the following with keras:

# For a single-input model with 10 classes (categorical classification):

model = Sequential()
model.add(autoencoder.layers[1])
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(y_train, num_classes=10)

# Train the model, iterating on the data in batches of 32 samples
model.fit(x_train, one_hot_labels, epochs=10, batch_size=32)

# Overall F1 score
f1_score(y_test, np.argmax(model.predict(x_test), axis=1), average='macro')

In the stacked autoencoder case, the procedure is the same except with more encoding layers. Discussion about this using keras here and here.