Your approach goes in the line of the popular histogram of gradients approach. See here and the corresponding Wikipedia entry. Now unless you have some already labelled data, training such a system is quite laborious. If possible, I would start by using some available implementation to experiment with, like the one offered by scikit-image.
There are some other features, like Linear Binary Pattern, but they're not as powerful as HOG. See in the module corresponding of scikit-image for a list of features and their implementations.
As for CNN, you should not need to extract any features. The system learns the features automatically. That is one of the nice properties of deep architectures. A huge number of papers show that these systems learn some edge oriented filters features (in the same line as the idea you are considering).
Note that these features do not consider color. That may be an interesting feature for you to consider. Or extract the features for each of the color channels.
Hope this helps.
This is similar to the difference between Pearson correlation and cosine similarity.
As explained here for example, the Pearson correlation is the cosine similarity between two demeaned vectors. So the normalized cross-correlation that you show is related to a Pearson correlation, while your proposal is related to a more general cosine similarity.
The advantage of demeaning is removing influence from overall levels. To illustrate with a simple example, generate two (ideally) uncorrelated vectors from a standard normal distribution (mean 0, standard deviation 1) in R.
set.seed(101)
f0 <- rnorm(100)
t0 <-rnorm(100)
Define a function to do the cosine similarity (no demeaning), and compare against the Pearson correlation (cor()
function):
cossim <-function(x,y) sum(x*y)/sqrt(sum(x^2)*sum(y^2))
cor(f0,t0)
# [1] 0.1078112
cossim(f0,t0)
# [1] 0.1093093
These aren't exactly 0, due to random sampling.
Now just add 4 units to both of these poorly correlated vectors (by either measure) and see what happens.
f4 <- f0 + 4
t4 <- t0 + 4
cor(f4,t4)
# [1] 0.1078112
cossim(f4,t4)
# [1] 0.9499962
Demeaning keeps the Pearson correlation at its original value despite the shifts in overall levels, but you now find almost perfect cosine similarity without demeaning.
Best Answer
For two vectors $v_i$ and $v_j$ with length $n$,
1, when the two vectors are normalized to zero mean and unit length ($v \leftarrow \frac{v-\bar{v}}{||v-\bar{v}||_2}$), their Pearson correlation coefficient $r(=corr(v_i, v_j))$ relates to their Euclidean distance $d(=||v_i-v_j||_2)$ by $r=1-d^2/2$, or equally $r=v_i^Tv_j$. Reference: http://t.cn/RL5JcKt.
2, when the two vectors are normalized to zero mean and unit standard deviation (or unit variance) ($v \leftarrow \frac{v-\bar{v}}{std(v-\bar{v})}$), then $r=1-d^2/(2*(n-1))$, or equally $r=v_i^Tv_j/(n-1)$.