First of all you need a spatial calibration, like a picture of a ruler. Let me know if you need a demo of how to do that. Then what I'd do, since it looks like you can assume the banana is more or less aligned with the image rows and columns is to sum the image horizontally to get the vertical profile. Then examine that to see where the tips of the banana are (say the top and bottom 20% of distance) and take the mean of the profile in between the tips. By knowing where the profile starts and stops, and by knowing the mean width, you can multiply by your spatial calibration to convert your numbers from pixels to cm. This is so simple that it looks like a homework question - is it?
Best Answer