You were doing well up until the line $H'^{-1} = H'^T$. There’s no particular reason to believe that $H$ is orthogonal; quite the opposite, in fact. (I’m going to switch here to the more conventional name $\mathtt P$ for the projection matrix and also make use of the block decomposition $\mathtt P = \left[\mathtt M \mid \mathbf p_4\right]$.) The first two columns of $\mathtt P$, $\mathbf p_1$ and $\mathbf p_2$, are the vanishing points of the world $x$- and $y$ axes. Unless the camera happens to be positioned just right relative to the $x$-$y$ plane, these vectors will not be orthogonal in the image. The upshot is that you have to compute the inverse of $\begin{bmatrix}\mathbf p_1 & \mathbf p_2 & \mathbf p_4\end{bmatrix}$, not its transpose.
This leads to the next potential problem: this matrix might not be invertible. The submatrix $\mathtt M$ is effectively the composition of a rotation and nonsingular affine transformation, so we know the first three columns are linearly independent, but it’s entirely possible that $\mathbf p_4 = \lambda \mathbf p_1 + \mu \mathbf p_2$, that is, that the image of the world origin lies on the vanishing line of the $x$-$y$ plane. This occurs when the camera center lies on this plane. In the specific construction in your question, this isn’t really an issue since you’re reprojecting onto the $x$-$y$ plane, but keep this in mind when generalizing to other planes.
Thus, the reprojection onto the $x$-$y$ plane is given by the homography matrix $$\mathtt H = \begin{bmatrix}\mathbf p_1 & \mathbf p_2 & \mathbf p_4\end{bmatrix}^{-1}.$$ This can be generalized to any plane that doesn’t contain the camera center by inserting an appropriate coordinate transformation $\mathtt B$ into the cascade, i.e., start with $$w\begin{bmatrix}x_c \\ y_c \\ 1 \end{bmatrix} = \mathtt{PB}\begin{bmatrix}x_w\\y_w\\0\\1\end{bmatrix}.$$
For comparison, here’s a construction that makes direct use of back-mapping the image points. Given a plane with homogeneous coordinate vector $\mathbf\Pi$, its intersection with the line through a fixed point $\mathbf C$ not on the plane and an arbitrary point $\mathbf X$ is $\left(\mathbf\Pi^T\mathbf X\right) \mathbf C - \left(\mathbf C^T\mathbf\Pi\right) \mathbf X$. The world coordinates of the camera center can be recovered from $\mathbf P$: its inhomogeneous Cartesian coordinates are $\tilde{\mathbf C}=\mathtt M^{-1}\mathbf p_4$. Finally, an image point $\mathbf x$ back-projects to a ray that intersects the plane at infinity at $\left((\mathtt M^{-1}\mathbf x)^T,0\right)^T$. Putting this all together, and adding a final matrix $\mathtt B$ that imposes a coordinate system on the plane, we get $$\mathtt H = \mathtt B \left(\mathbf C \mathbf \Pi^T-\mathbf C^T \mathbf \Pi \mathtt I_4 \right) \begin{bmatrix}\mathtt M^{-1}\\\mathbf 0^T\end{bmatrix}.$$ With $\mathbf\Pi=(0,0,1,0)^T$—the $x$-$y$ plane—and $$\mathtt B = \begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&0&1\end{bmatrix}$$ this produces the same homography matrix as above.
The details of the anatomy of the projection matrix that I used above can be found in Hartley and Zisserman’s Multiple View Geometry In Computer Vision and other standard references on the subject.
The location on the image plane will give you a ray on which the object lies. You’ll need to use other information to determine where along this ray the object actually is, though. That information is lost when the object is projected onto the image plane. Assuming that the object is somewhere on the road plane is a huge simplification. Now, instead of trying to find the inverse of a perspective mapping, you only need to find a perspective projection of the image plane onto the road. That’s a fairly straightforward construction similar to the one used to derive the original perspective projection.
Start by working in camera-relative coordinates. A point $\mathbf p_i$ on the image plane has coordinates $(x_i,y_i,f)^T$. The original projection maps all points on the ray $\mathbf p_i t$ onto this point. Now, we’re assuming that the road is a plane, so it can be represented by an equation of the form $\mathbf n\cdot(\mathbf p_o-\mathbf r)=0$, where $\mathbf n$ is a normal to the plane and $\mathbf r$ is some known point on it. We seek the intersection of the ray and this plane, which will satisfy $\mathbf n\cdot(\mathbf p_i t-\mathbf r)=0$. Solving for $t$ and substituting gives $$\mathbf p_o = {\mathbf n\cdot \mathbf r \over \mathbf n\cdot \mathbf p_i}\mathbf p_i.$$ Moving to homogeneous coordinates, this mapping is the linear transformation represented by the matrix $$
M = \pmatrix{1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ {n_x \over \mathbf n\cdot\mathbf r} & {n_y \over \mathbf n\cdot\mathbf r} & {n_z \over \mathbf n\cdot\mathbf r} & 0},
$$ i.e., $$
\mathbf p_o = M\pmatrix{x_i \\ y_i \\ f \\ 1}.
$$ Once you have this, it should be obvious how to complete the mapping back to world coordinates.
All that’s left is to find the parameters $\mathbf n$ and $\mathbf r$ that describe the road plane in camera coordinates. That’s also pretty simple. Since we’re taking the road to be the plane $y=0$ in world coordinates, its normal there is $(0,1,0)^T$. As for a known point on the road, the origin will do. Another reasonable choice is the point at which the camera’s optical axis meets the road, since the the camera-relative coordinates of that point will be of the form $(0,0,z)^T$. Convert both of these into camera-relative coordinates, and you’re done.
Note that you don’t necessarily need to know anything about the camera to compute a perspective transformation that will map from the image plane to the road plane. If you can somehow find four pairs of non-colinear points, i.e., a pair of quadrilaterals, that correspond to each other on these two planes, a planar perspective transformation that relates them can be computed fairly easily. See here for details. Essentially, you calibrate the camera view by matching a region of the image to a known region in the road plane.
Update 2018.10.22: If you have the complete camera matrix $P$, which you do, there’s a fairly straightforward way to construct the back-mapping to points on the road with a few matrix operations. We choose a coordinate system for the road plane, which gives us a $4\times3$ matrix $M$ that maps from these plane coordinates to world coordinates, i.e., $\mathbf X = M\mathbf x$. The image of this point is $PM\mathbf x$. If $PM$ is invertible, which it will be unless the camera center is on the road plane, the matrix $(PM)^{-1}$ maps from image to plane coordinates, and so the back-mapping from image to world coordinates on the road is $M(PM)^{-1}$. For the plane $Y=0$, a natural choice for $M$ is $$M=\begin{bmatrix}1&0&0\\0&0&0\\0&1&0\\0&0&1\end{bmatrix},$$ which simply inserts a $Y$-coordinate of zero to obtain world coordinates. You can adjust the origin of this coordinate system by changing the last column of $M$.
Best Answer
You can just change coordinates so that your parallel plane becomes the plane $Z=0$.
$P\begin{bmatrix}X\\Y\\Z\\1\end{bmatrix}=PA^{-1}\begin{bmatrix}X\\Y\\Z-h\\1\end{bmatrix}$ where $A=\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&1&-h\\0&0&0&1\\\end{bmatrix}$.
I think then you can proceed as normal with $PA^{-1}$ by continuing with $Z-h=0$ (i.e. $Z=h$.)
That's the argument that first occurs to me anyway: change world coordinates to make your plane coincide with the $Z$ plane. I have not seen it written down, and I have not applied it, but that seems reasonable.