Two things, for starters.
One, definitively do not work in RGB. Your default should be Lab (aka CIE L*a*b*) colorspace. Discard L
. From your image it looks like the a
coordinate gives you the most information, but you probably should do a principal component analysis on a
and b
and work along the first (most important) component, just to keep things simple. If this does not work, you can try switching to a 2D model.
Just to get a feeling for it, in a
the three yellowish coins have STDs below 6, and means
of 137 ("gold"), 154, and 162 -- should be distinguishable.
Second, the lighting issue. Here you'll have to carefully define your problem. If you want to distinguish close colors under any lighting and in any context -- you can't, not like this, anyway. If you are only worried about local variations in brightness, Lab will mostly take care of this. If you want to be able to work both under daylight and incandescent light, can you ensure uniform white background, like in your example image? Generally, what are your lighting conditions?
Also, your image was taken with a fairly cheap camera, by the looks of it. It probably has some sort of automatic white balance feature, which messes up the colors pretty bad -- turn it off if you can. It also looks like the image either was coded in YCbCr at some point (happens a lot if it's a video camera) or in a similar variant of JPG; the color information is severely undersampled. In your case it might actually be good -- it means the camera has done some denoising for you in the color channels. On the other hand, it probably means that at some point the color information was also quantized stronger than brightness -- that's not so good. The main thing here is -- camera matters, and what you do should depend on the camera you are going to use.
If anything here does not make sense -- leave a comment.
Here are some tips for enhancing the performance of VW models:
- Shuffle the data prior to training. Having a non-random ordering of your dataset can really mess VW up.
- You're already using multiple passes, which is good. Try also decaying the learning rate between passes, with
--decay_learning_rate=.95
.
- Play around with the learning rate. I've had cases where
--learning_rate=10
was great and other cases where --learning_rate-0.001
was great.
- Try
--oaa 16
or --log_multi 16
rather than --ect 16
. I usually find ect
to be less accurate. However, oaa
is pretty slow. I've found --log_multi
to be a good compromise between speed and accuracy. On 10,000 training examples, --oaa 16
should be fine.
- Play with the loss function.
--loss_function=hinge
can sometimes yield large improvements in classification models.
- Play with the
--l1
and --l2
parameters, which regularize your model. --l2
in particular is useful with text data. Try something like --l2=1e-6
.
- For text data, try
--ngram=2
and --skips=2
to add n-gram and skip grams to your models. This can help a lot.
- Try
--autolink=2
or --autolink=3
to fit a quadratic or cubic spline model.
- Try ftrl optimization with
--ftrl
. This can be useful with text data or datasets with some extremely rare and some extremely common features.
- Try some learning reductions:
- Try a shallow neural network with
--nn=1
or --nn=10
.
- Try a radial kernel svm with
--ksvm --kernel=rbf --bandwidth=1
. (This can be very slow).
- Try a polynomial kernel svm with
--ksvm --kernel=poly --degree=3
. (This can be very slow).
- Try a gbm with
--boosting=25
. This can be a little slow.
VW is extremely flexible, so it often takes a lot of fine tuning to get a good model on a given dataset. You can get a lot more tuning ideas here: https://github.com/JohnLangford/vowpal_wabbit/wiki/Command-line-arguments
Regarding the post you linked to: that person used vw with squared loss on an unbalanced classification problem. That's a silly thing to do, and pretty much guarantees that any linear model will always predict the dominant class. If you're worried about class balance, VW supports weights, so you can over-weight the rarer classes.
Edit: You have 100 classes and 10,000 training examples? That's an average of 100 observations per class, which isn't that many to learn from, no matter what model you use.
Best Answer
This is a very good question and you need to understand this to gain more understanding into deep learning.
Basically, you have raw images, lets take one image. This image has 3 channels and in each channel pixel values range from 0 to 255.
Our goal here is to squash the range of values for all the pixels in the three channels to a very small range. This is where Preprocessing comes in. But dont think preprocessing only involves the mean and std devtn techniques there are many other like PCA, whitening etc.
1) Using mean: By computing the mean of say, the first red pixel values across all the training images will get you the average red color value that is present across all the training images at the first position. Similarly you find this for all the red channel values, green channel values. Finally you get an average image from all the training images.
Now if you subtract this mean image from all the training images you obviously transform the pixel values of the images, the image is no longer interpretable to the human eye, the pixal values now lie in range from positive to negative where the mean lies at zero.
2) Now if you again divide these by std deviation you essentially squash the pixel value range before to a small range.
BUT WHY ALL THIS? I will say from my experience that doing this preprocessing on the images and then giving these transformed images to the classifier model will run faster and better. That's why.
As you are into deep learning, look into batch normalization after you understand this normalization concept