A feature vector is the result of applying any deterministic function to an input image (or portion of an image) to get a vector output; if the function would normally give an array output, the array can be reshaped into a vector. Normally you would use several such functions, each producing a vector. The length of the vector produced by any one function does not need to be the same as for the other functions, but the length of the output vector needs to be consistent for any one function. You would then concatenate all of the different feature vectors together for one image into one vector. You would repeat this for other images, applying the same functions to them, each image contributing a row (or column) to a 2D array of data that combined gives information about a number of different images.
Now, for any one classification task, you typically do not know ahead of time which features are going to be most relevant to making the classification decision. Feature selection is the process of analyzing the feature vectors over a number of samples, together with the known classification results for the samples, and deciding which portions of the data matrix are most relevant to predicting the classification. And then throwing away the other portions as not being worth computing with. The process of selecting the relevant information.
I do not know the distinction being made between forward and backwards feature selection.
Best Answer