CNN – Different Input Sizes for Training and Prediction in CNN for Image Segmentation

computer visionconv-neural-networkdata preprocessingneural networks

I’m relatively unexperienced when it comes to deep learning and I’m trying to reimplement a CNN architecture for segmentation of medical images based on a paper. In the paper they state that they use input images that are of size 448×448. Further they state that they crop random sub-images which are 224×224 in order to have more data to train on.

Due to my lack of experience I’m not sure what the most likely interpretation of this is. Does this mean that they have trained the network with input size 224×224 and when using it on unseen data they are cut the input images into 4 pieces and feed it to the network or do they resize the input layers to 448×448 and reuse the weights from the network trained on 224×224? Or is there some other more likely interpretation that I’m unaware of?

Best Answer

I am also relatively unexperienced. I think crop just means using a subset of the image for training. It is a common way of data augmentation when you need more training data but only have a limited number of images. For a 448X448 image, you can randomly get a lot of different 224X224 cropped sub-images. They can be any position within the original image.

As for different input size for training and prediction, can you also use a cropped size for the prediction? I think the sizes have to match.

Related Question