Normally you represent each colour channel as a feature map.
So (if you follow pylearn conventions) your input will have shape (n_samples, 3, size_y, size_x). Since images are usually stored colour-channel last (n_images, size_y, size_x, 3), in python you usually reformat your image collection with feature_maps = np.rollaxis(image_collection, 3, 1)
Fully connected layers can only deal with input of a fixed size, because it requires a certain amount of parameters to "fully connect" the input and output. While convolutional layers just "slide" the same filters across the input, so it can basically deal with input of an arbitrary spatial size.
In the example network with fully-connected layers at the end, a 224*224 image will output an 1000d vector of class scores. If we apply the network on a larger image, the network will fail because of the inconsistency between the input and parameters of the first fully-connected layer.
One the other hand, if we use a fully convolutional network, when applied to a larger image we'll get 1000 "heatmaps" of class scores.
As shown in the following figure (from the FCN segmentation paper), the upper network gives one score per class, and after the conversion (convolutionalization), we can get the a heatmap per class for a larger image.
About "stride", on the same page, in the section Spatial arrangement:
When the stride is 1 then we move the filters one pixel at a time. When the stride is 2 (or uncommonly 3 or more, though this is rare in practice) then the filters jump 2 pixels at a time as we slide them around. This will produce smaller output volumes spatially.
Best Answer
You should work with each image as a volume of dimensions 224x224x5. You still do 2D convolution over the first 2 dimensions as usual, but keep the entire 3rd dimension. For instance, if you use a 7x7 convolution window, each filter will produce a 224x224x1 volume as output (with stride = 1 and zero padding), and the convolutional layer as a whole will produce a 224x224xN volume, where N is the number of filters. ConvNetJS and many other conv nets libraries take this approach.