The two most common methods (in my experience) for comparing signals are the correlation and the mean squared error. Informally, if you imagine your signal as a point in some N-dimensional space (this tends to be easier if you imagine them as 3D points) then the correlation measures whether the points are in the same direction (from the "origin") and the mean squared error measures whether the points are in the same place (independent of the origin as long as both signals have the same origin). Which works better depends somewhat on the types of signal and noise in your system.
The MSE appears to be roughly equivalent to your example:
mse = 0;
for( int i=0; i<N; ++i )
mse += (x[i]-y[i])*(x[i]-y[i]);
mse /= N;
note however that this isn't really Pearson correlation, which would be more like
xx = 0;
xy = 0;
yy = 0;
for( int i=0; i<N; ++i )
{
xx += (x[i]-x_mean)*(x[i]-x_mean);
xy += (x[i]-x_mean)*(y[i]-y_mean);
yy += (y[i]-y_mean)*(y[i]-y_mean);
}
ppmcc = xy/std::sqrt(xx*yy);
given the signal means x_mean and y_mean. This is fairly close to the pure correlation:
corr = 0;
for( int i=0; i<N; ++i )
corr += x[i]*y[i];
however, I think the Pearson correlation will be more robust when the signals have a strong DC component (because the mean is subtracted) and are normalised, so a scaling in one of the signals will not cause a proportional increase in the correlation.
Finally, if the particular example in your question is a problem then you could also consider the mean absolute error (L1 norm):
mae = 0;
for( int i=0; i<N; ++i )
mae += std::abs(x[i]-y[i]);
mae /= N;
I'm aware of all three approaches being used in various signal and image processing applications, without knowing more about your particular application I couldn't say what would be likely to work best. I would note that the MAE and the MSE are less sensitive to exactly how the data is presented to them, but if the mean error is not really the metric you're interested in then they won't give you the results you're looking for. The correlation approaches can be better if you're more interested in the "direction" of your signal than the actual values involved, however it is more sensitive to how the data are presented and almost certainly requires some centring and normalisation to give the results you expect.
You might want to look up Phase Correlation, Cross Correlation, Normalised Correlation and Matched Filters. Most of these are used to match some sub-signal in a larger signal with some unknown time lag, but in your case you could just use the value they give for zero time lag if you know there is no lag between the two signals.
Best Answer
Qualifications
It so happens that in the Iris data set the rows (as is this data set is usually presented) are values on four variables, all with the same dimensions and units. However, I will not assume reference to this specific data set.
For more on that data set, one starting point is
What aspects of the Iris data set make it so successful ...
Moreover, your question title asks about similarity between variables (features, attributes, etc.), but the specific details hint at an interest in similarity between observations (items, cases, etc.). I will focus on measuring similarity of variables, particularly given your specific mention of correlation, which reflects a common misunderstanding of correlation.
Note that what appears as rows and what appears as columns in data is a matter of convention or convenience and is otherwise not fundamental. In other words, a data set can always be transposed.
Correlation does not measure similarity
Contrary to your statement, correlation does not measure similarity if similarity means that the highest value of a measure is achieved if and only if all values are identical. (Any one can reverse the game and define a measure and then give it some name from their language as a label. Examples abound in all sciences.)
The first argument against that is that correlation can be applied to variables which are in quite different units, so that it is then nonsensical to ask whether values are similar. So, if the variables are rainfall and wheat yield, the units of measurement are different; correlation can be calculated so long as there are paired values, but it makes no sense to ask whether 20 mm rainfall is similar to 20 kg/ha wheat yield.
The second argument against that is that you can achieve perfect correlations with value $1$ between $y$ and $x$ so long as $y = a + bx$ for any $a$ and any positive $b$. So $10^\text{anything} y$ and $y$ have correlation 1 but their values are similar only if the exponent is close to 0.
Similarity can be defined in many ways: you need to choose
To your question: you need to firm up quite what you mean by similarity, but for variables $x, y$ on the same measurement scale, summary measures of similarity based on the differences $x - y$ could all make sense; measures based on the ratios $y/x$ or $x/y$ could all make sense so long as all values are of the same sign and not zero; measures based on comparing $\log y$ and $\log x$ could make sense so long as all values are positive. Further, you have to decide whether you want your measure to have the same units as the original variable, or to be free of units so that the similarity between different variables can be compared.
For the Iris data all these conditions are satisfied.
Indeed, the point of the exercise may well be to underline that the vague concept of similarity can be made precise in many different ways.