My question is quite simple. I have two images, on the first one I know the location of points $P1, P2, P3$, and $P4$. In the second image, I know the location of $P2'$, $P3'$, $P4'$, and point $Q'$. Is there any way I could find the exact location of point $Q$ on the first image?
Edit
I have prepared a gist to replicate this problem in Python: gist.
I think this is a mathematical challenge, and I'm seeking a solution for my project in which I'm creating a CV application to correlate points between two sheets of paper.
Indeed, a perspective transform would solve the issue but unfortunately, in image 2, I don't know the location of $P1'$. Also, I don't have access to the camera calibration parameters such as the focal length. What I know is that it's a DIN A4 in 3D world and I know its dimensions in millimeters. I also know that is a rectangle that, due to the perspective, the angles are not $90\deg$ on the projected image (on the screen).
I have tried to work with affine transformations but it's not leading me closer to the solution, as the weak perspective is not working accurately enough. As an example, let's say that the location of the points, measured on the first image, are:
- $P1 (2498, 3169)$
- $P2 (521, 3199)$
- $P3 (681, 762)$
- $P4 (2290, 776)$.
On the second image are:
- $P2' (2209, 1009)$
- $P3' (2634, 2908)$
- $P4' (271, 2870)$
- $Q' (1368, 2096)$
For reference, both of the images are $4032×3024$ pixels. So the center would be at $C=(2016,1512)$. I grabbed all the points myself manually and the solution I got for $Q$ is $(1581,1405)$.
Best Answer
First, the coordinates given have to corrected with respect to the center. As mentioned in question, the size of the image is $3024 \times 4032$ pixels. Therefore, the coordinates of the center are $( \dfrac{3024}{2}, \dfrac{4032}{2}) = (1512, 2016) $. With this, the corrected coordinates becomes
$P_1 = (986, -1153)$
$P_2 = (-991, -1183)$
$P_3 = (-831, 1254) $
$P_4 = (778, 1240) $
And
$ P_2' = (697, 1007) $
$P_3' = (1122, -892)$
$P_4' = (-1241, -854)$
$Q' = (-144, -80) $
Next, we have to find the focal length $z_0$ of the camera. This is only possible if the four given points are the four corners of a rectangle, as is the case here. So taking four rays of the form
$ R_i = t_i \begin{bmatrix} P_i \\ z_0 \end{bmatrix} $
If the $R_i$'s are vertices of a rectangle then two conditions must be satisfied:
$R_2 - R_1 = R_3 - R_4 $
$ (R_2 - R_1) \cdot (R_3 - R_2) = 0 $
The first of these equations gives a linear system of three equations in the four unknown $t_i$'s. Its solution is
$ (t_1, t_2, t_3, t_4) = \lambda \mathbf{v} $
where $\mathbf{v} \in \mathbb{R}^4 $ and is now known.
The second equation leads to a quadratic equation in $z_0$, and gives
$ z_0 = - \sqrt{ \dfrac{ (v_2 P_2 - v_1 P_1) \cdot ( v_3 P_3 - v_2 P_2) }{ (v_3 - v_2)(v_2 - v_1) }} $
We can take $\lambda = 1$, and compute the $R_i$'s.
From which we can compute the ratio $r = \dfrac{\| R_1 R_2 \|}{ \|R_2 R_3 \|} $
This ratio comes to $r \approx \dfrac{1}{\sqrt{2}} $.
Moving on to the second image, we only have three vertices known. So we'll write
$ R_i' = t_i' Q_i' , i = 2, 3, 4 $
We can take $t_2' = K $ where $K \gt 0$ is a chosen constant. This leaves two unknowns, $t_3' $ and $t_4'$ to be determined. To determine them we impose the following two conditions
$(R_2' - R_3') \cdot (R_4' - R_3') = 0 $
$\| R_4' - R_3'\|^2 = r^2 \| R_2' - R_3' \|^2 $
Solving this quadratic system leads to the values of $t_3'$ and $t_4'$. There can be more than one solution. In this case we have to choose the right one based on the direction of the normal vector to the plane defined by $R_2', R_3'$ and $R_4'$.
Having determined the correct values of $t_3'$ and $t_4'$ we now have $R_2', R_3', R_4'$.
We can now find the point $R_{Q'}$ whose corresponding image is point $Q'$ in the second image, by intersecting the ray from $Q'$ with the plane defined by $R_2', R_3', R_4'$.
The following step is to decompose the vector $R_{Q'} - R_3' $ into two (perpendicular) components along the two vectors $(R_2' - R_3') $ and $(R_4' - R_3')$, which is a trivial task.
So now we have
$ R_{Q'} = R_3' + \alpha (R_2' - R_3') + \beta (R_4' - R_3')$
Now we go back to the first image in which we know $R_2, R_3, R_4$ and write
$ R_Q = R_3 + \alpha (R_2 - R_3) + \beta (R_4 - R_3) $
Finally we have to intersect the ray from the origin to $R_Q$ with the plane $z = z_0$, again a trivial task.
Once we do that, we have the $2D$ vector $Q$.
Taking the center of view into consideration and adjusting the coordinates $Q$, we obtain the position coordinates in the image.
After running the above procedure on the given data, I obtained
$Q = (77.892, 621.349) $
Taking into account the coordinates of the center of view ( which are (1512, 2016) ), we get the coordinates for $Q$ relative to the top left corner of the image as
$ ( 1590, 1395 )$
which is very close to the values estimated by the OP. His estimate for the coordinates of $Q$ were $(1581, 1405)$.
Below is the implementation of the above steps, written in Excel VBA script. The source code for the functions "solve_rd_system" and "intersect_three_quadrics" is included in this Excel file as a macro (VBA script). Click on the link, to open the online file, then click on "Editing" and choose "Open in Desktop App". This will open the file in your desktop Excel program. Click "View" then select "Macros".
This is an image of the Excel worksheet that produced these results.
The input data is in columns $F$ and $G$. The rest is output.