MATLAB: How to sort and get the coordinates of specific blobs

http://dsp.stackexchange.com/questions/5930/find-circle-in-noisy-data

I've already attached some image to give you an idea of what i'm trying to do. You see, this is a scanned picture. I'm kinda new when it comes to this kind of stuff, so i don't know much yet
i put four blue dots on the 4 corners of the paper so i could use them as control points(im trying to align this image with another one using the cp2tform function). I've tried to extract the coordinates of the blobs manually through MS paint and it worked. Theres no problem with the control points for the base exam paper since it will be the same for each computation, while for the moving image, it would be different for each scanned image. Please help me find a way to sort hose four blobs from the rest and extract their coordinates. Thanks guys.

Best Answer

First I'd straighten the image by finding the centerline, getting it's angle (orientation from regionprops()) and them calling imrotate.
Then take the red channel, which will give the most contrast, then invert the image (to make lines white and background dark). To find the centerline, first call imdilate (to make sure it's connected and has no breaks or gap), then threshold, label with bwlabel(), call
measurements = regionprops(labeledImage, 'MajorAxisLength', 'Orientation');
asking for MajorAxisLength, and look for the blob with the largest MajorAxisLength and Orientation. Then use imrotate, passing in the orientation, to straighten the image.
straightenedImage = imrotate(grayImage, orientationAngle);
I wouldn't bother with cp2tform. That's just more complicated than it needs to be.
See my Image Segmentation Tutorial if you need an example: http://www.mathworks.com/matlabcentral/fileexchange/?term=authorid%3A31862
What might be better than to get all 50 centroids in one shot would be to get a sub image for each question. This would involve straightening the image (described above) then getting the vertical profile by calling
verticalProfile = mean(grayImage, 2);
and by thresholding that profile identifying where each line starts and stops. Then have a loop where you extract the rows for the question, threshold, call regionprops. Do that for both the left half of the band of rows you extracted, and for the right half.
It doesn't look too hard - I could probably do it within an hour. I mean, to give the letter selected for each question. Come back with a good attempt at the coding if you have trouble.