Qualifications
It so happens that in the Iris data set the rows (as is this data set is usually presented) are values on four variables, all with the same dimensions and units. However, I will not assume reference to this specific data set.
For more on that data set, one starting point is
What aspects of the Iris data set make it so successful ...
Moreover, your question title asks about similarity between variables (features, attributes, etc.), but the specific details hint at an interest in similarity between observations (items, cases, etc.). I will focus on measuring similarity of variables, particularly given your specific mention of correlation, which reflects a common misunderstanding of correlation.
Note that what appears as rows and what appears as columns in data is a matter of convention or convenience and is otherwise not fundamental. In other words, a data set can always be transposed.
Correlation does not measure similarity
Contrary to your statement, correlation does not measure similarity if similarity means that the highest value of a measure is achieved if and only if all values are identical. (Any one can reverse the game and define a measure and then give it some name from their language as a label. Examples abound in all sciences.)
The first argument against that is that correlation can be applied to variables which are in quite different units, so that it is then nonsensical to ask whether values are similar. So, if the variables are rainfall and wheat yield, the units of measurement are different; correlation can be calculated so long as there are paired values, but it makes no sense to ask whether 20 mm rainfall is similar to 20 kg/ha wheat yield.
The second argument against that is that you can achieve perfect correlations with value $1$ between $y$ and $x$ so long as $y = a + bx$ for any $a$ and any positive $b$. So $10^\text{anything} y$ and $y$ have correlation 1 but their values are similar only if the exponent is close to 0.
Similarity can be defined in many ways: you need to choose
To your question: you need to firm up quite what you mean by similarity, but for variables $x, y$ on the same measurement scale, summary measures of similarity based on the differences $x - y$ could all make sense; measures based on the ratios $y/x$ or $x/y$ could all make sense so long as all values are of the same sign and not zero; measures based on comparing $\log y$ and $\log x$ could make sense so long as all values are positive. Further, you have to decide whether you want your measure to have the same units as the original variable, or to be free of units so that the similarity between different variables can be compared.
For the Iris data all these conditions are satisfied.
Indeed, the point of the exercise may well be to underline that the vague concept of similarity can be made precise in many different ways.
Best Answer
Aside: It sounds like your underlying problem (though not your direct questions) is related to calibration, on which a fair bit has been written. If your device is not as close as you'd like to the commercial one, it may not matter so much, as long as it's fairly consistent in the way it responds. A calibration curve (in most cases, just a line) is often used to adjust readings on devices to match some standard (whence the scale on which the readings are made can be correspondingly adjusted for any such consistent bias). So the methodology of calibration may be of use to you if your device has some bias compared to the commercial one.
Your direct question sounds like you probably want equivalence testing; in particular, a two-one-sided test (TOST) procedure.
The more usual way of setting this up amounts to setting a pair of equivalence bounds around your gold standard measurement (values which are "close enough" to call equivalent) and then showing that you'd reject the hypothesis that the population mean of your measurement would lie above the upper bound and also that it would lie below the lower bound (and so you would conclude it will lie between the bounds).
[This can also be recast as seeing if a two sided confidence interval for the parameter lies entirely within the pair of equivalence bounds.]
See for example Walker & Nowacki (2011) [1]; there's a discussion of TOST in industrial applications in Richter & Richter (2002) [2].
However, a caveat: Presumably you're testing your device not at one value but across the range of the device. Given that there may be more bias at one value than another (indeed, it's possible to be biased low in one place and high in another), you probably want to look at equivalence at each value for the standard device rather than a simple TOST setting (in that case establishing equivalence bands, which may not necessarily be equally wide at every value -- e.g. if equivalence is in percentage terms). This brings us back nearer to the calibration problem I mentioned at the start.
[1]: Walker, E., & Nowacki, A. S. (2011).
Understanding Equivalence and Noninferiority Testing.
Journal of General Internal Medicine, 26(2), 192–196.
http://doi.org/10.1007/s11606-010-1513-8
(Ignore the 'noninferiority' stuff there, you're just after the equivalence part)
[2]: Richter, S. J. & C. Richter (2002),
"A Method for Determining Equivalence in Industrial Applications,"
Quality Engineering, 14(3), 375–380