Lets assume you have point P=[X; Y; Z], then your camera coordinates of that point would be x'= X/Z and y'=Y/Z. You then project it to the image plane with u = fx * x' + cx and v = fy * y' + cy with fx, fy are the focal length in pixel and cx, cy the coordinates of the principal point in the image. You can reverse the math to get you coordinates, but: this is without any distortion taken into accound and its only the projection of a simple pinhole camera. If you want to compute the distortion the whole process gets a lot more complicated.
Another thing you could do to get the camera coordinates from your world coordinates is [x'; y'; z'] = R*[X; Y; Z]+t . Since you said you have the orientation and location of your camera you can compute the rotation matrix R and translation vector t.
It would be a lot easier to just calibrate the camera and use the camera parameters, since in the calibration all distortions are taken into account. If you dont do that you have to calculate the distortion by hand, and thats a lot more work.
EDIT: Another possible, but not very accurate method would be to take the relations of the object respectivly to each other. You know the exact heigth and width of you object, then you can calculate the pixel ration of that (lets say you object is 10 cm, in the image its 100 pixel, so 10 pixel equals 1 cm). Now if you do that for all 4 pictures you have you will get a relation with each of them. With that you can calculate the distance to the object, which would be your first coordinate. Then you could try to triangulate the other positions with each other. Keep in mind this method would be not very accurate since you also dont take distortion into account and depending on your picture quality the result may vary.
Best Answer