Solved – Training accuracy increase abruptly at first epoch to 99%. is it normal

adamconv-neural-networktensorflow

I expected my training process is like this:

enter image description here

But my training accuracy increased to 99% at only 1 epoch, not steadily increase.

I was suspicious about high learning rate, so i used various learning rate(0.001~0.00001). But the training accuracy increased to ~99% at only one epoch and kept 99% ~ 100%.

Otherwise, validation accuracy increased to ~70% at one epoch and increased up to 80% during 2~3 epoch, and then oscillated during training.

Another problem is that, training loss is almost 0, thereby only weight decay term remain in loss function. it caused weights to become all 0.

I don't know what is problem. Is input image size too small?


below is my detail setting:


I'm using convolutional 1~5th layer weights from Alexnet, for fine-tune my CNN network.

Difference between Alexnet is that, my input image size is 87×33 unlike 224×224 in Alexnet.

(Accordingly, my last(5) conv feature shape is 4x2x256, fully connected layer is same as Alexnet).

Number of classes are two(positive, negative) in my network.

Number of training image data for positive is ~40k, for negative is ~40k.

(apply random crop,warping,flipping for each sampling)

I used Adam-optimizer in tensorflow.


EDIT:

I checked training samples are different per feeding,
and i'm pretty sure that it's not a problem.

By the way, i realized that i use subset of ImageNet data for fine-tuning(as i know, AlexNet is trained by ImageNet, as well).

Although, i only used pre-trained weights for conv-layer(1~5)
and i only used ImageNet data for negative training image, not for positives,
i think it's the cause for my problem.

is it make sense?

Best Answer

Check than the batch is giving different samples, looks like you are feeding always the same samples to the network... Another thing you have to understand, is that you are using a pretrained model, which means that lots of patterns are already learnt, if your data fit in those patterns, is it possible than your problem was already solved in the model.

Related Question