Here is a robust 4 step solution that gives correct result for any $Ray$ and $Rect$ according to my tests.
Step 1:
Construct 4 vectors from the origin of the ray $\mathbf{r_o}$ to each corner of the rectangle $Rect$, in counter-clockwise order as seen from $\mathbf{r_o}$. Let's call these vectors $\mathbf{v_1}$...$\mathbf{v_4}$:
$$\mathbf{v_1}=\mathbf{p_o}-\mathbf{r_o}+0.5*rectWidth*\mathbf{p_x}+0.5*rectHeight*\mathbf{p_y}$$
$$\mathbf{v_2}=\mathbf{p_o}-\mathbf{r_o}+0.5*rectWidth*\mathbf{p_x}-0.5*rectHeight*\mathbf{p_y}$$
$$\mathbf{v_3}=\mathbf{p_o}-\mathbf{r_o}-0.5*rectWidth*\mathbf{p_x}-0.5*rectHeight*\mathbf{p_y}$$
$$\mathbf{v_4}=\mathbf{p_o}-\mathbf{r_o}-0.5*rectWidth*\mathbf{p_x}+0.5*rectHeight*\mathbf{p_y}$$
To ensure that these points are in counter-clockwise order as seen from $\mathbf{r_o}$, check if
$$(\mathbf{p_x}\times\mathbf{p_y})\cdot(\mathbf{p_o}-\mathbf{r_o})<0$$
If the above condition doesn't hold swap vectors $\mathbf{v_1}$ and $\mathbf{v_3}$ to swap the winding order of the points to satisfy the winding requirement.
Step 2:
Construct 4 normal vectors $\mathbf{n_1}$...$\mathbf{n_4}$ for planes that lay on each edge of $Rect$ and $\mathbf{r_o}$ using the above vectors as follows:
$$\mathbf{n_1}=\frac{\mathbf{v_1} \times \mathbf{v_2}}{||\mathbf{v_1} \times \mathbf{v_2}||}$$
$$\mathbf{n_2}=\frac{\mathbf{v_2} \times \mathbf{v_3}}{||\mathbf{v_2} \times \mathbf{v_3}||}$$
$$\mathbf{n_3}=\frac{\mathbf{v_3} \times \mathbf{v_4}}{||\mathbf{v_3} \times \mathbf{v_4}||}$$
$$\mathbf{n_4}=\frac{\mathbf{v_4} \times \mathbf{v_1}}{||\mathbf{v_4} \times \mathbf{v_1}||}$$
Step 3:
Out of these 4 normals pick a normal $\mathbf{n_{min}}$ and $\mathbf{v_a}$ and $\mathbf{v_b}$ that were used for cross product for the normal, that results smallest value for $\mathbf{n_i} \cdot \mathbf{r_d}$ (i.e. the most perpendicular plane to $\mathbf{r_d}$). For example if $\mathbf{n_{min}}=\mathbf{n_2}$, then $\mathbf{v_a}=\mathbf{v_2}$ and $\mathbf{v_b}=\mathbf{v_3}$.
If $\mathbf{n_{min}} \cdot \mathbf{r_d}>=0$, then $\mathbf{r'_d}=\mathbf{r_d}$ since $Ray$ intersects $Rect$. If not, continue to the last step
Step 4:
Project the ray direction $\mathbf{r_d}$ to the plane defined by $\mathbf{n_{min}}$:
$$\mathbf{r_{pd}}=\mathbf{r_d}-(\mathbf{n_{min}} \cdot \mathbf{r_d})*\mathbf{n_{min}}$$
And clamp $\mathbf{r_{pd}}$ to the edge defined by $\mathbf{v_a}$ and $\mathbf{v_b}$ as follows to get the final result $\mathbf{r'_d}$:
$$
\mathbf{r'_d}=
\begin{cases}
\mathbf{\hat{v_a}} & \text{if }(\mathbf{r_{pd}} \times \mathbf{v_a}) \cdot \mathbf{r_d}<0\\
\mathbf{\hat{v_b}} & \text{if }(\mathbf{v_b} \times \mathbf{r_{pd}}) \cdot \mathbf{r_d}<0\\
\mathbf{\hat{r_{pd}}} & \text{otherwise}\\
\end{cases}
$$
Because I need to calculate steps 1 and 2 for solid angle calculation anyway, the additional cost of this algorithm is only steps 3 and 4 in my case.
Rewritten on 2016-11-12. The OP raised very good questions in the comments. Note that this is not intended as an exhaustive answer (as one might expect from, say, a mathematician?), but more like observations from someone who routinely uses bilinear interpolation for numerical data.
How can a bilinear interpolation be defined for an arbitrary quadrilateral,
i.e. without running into singularities?
Bilinear interpolation is usually defined as
$$f(u,v) = (1-u) (1-v) F_{00} + u (1 - v) F_{01} + (1-u) v F_{10} + u v F_{11}$$
where $0 \le u, v \le 1$ and
$$\begin{array}{}
f(0,0) = F_{00} \\
f(0,1) = F_{01} \\
f(1,0) = F_{10} \\
f(1,1) = F_{11} \\
f(\frac{1}{2},\frac{1}{2}) = \frac{F_{00}+F_{01}+F_{10}+F_{11}}{4}
\end{array}$$
If we use notation
$$p(t; p_0, p_1) = (1-t) p_0 + t p_1 = p_0 + t (p_1 - p_0)$$
for the simplest form of linear interpolation, with $0 \le t \le 1$, $p(0;p_0,p_1) = p_0$, $p(1;p_0,p_1) = p_1$, then bilinear interpolation can be written as
$$f(u,v) = p(u; p(v; F_{00}, F_{01}), p(v; F_{10}, F_{11}))$$
so this simply extends the single-variable linear interpolation to two variables and $2^2 = 4$ samples.
Bilinear interpolation is not often used for arbitrary quadrilaterals. After pondering the questions OP posed in the comments, I realized that the typical form used for interpolation,
$$\begin{cases}
x(u,v) = x_{00} + u ( x_{10} - x_{00}) + v ( x_{01} - x_{00} ) \\
y(u,v) = y_{00} + u ( y_{10} - y_{00}) + v ( y_{01} - y_{00} ) \\
f(u,v) = (1-v) \left ( (1-u) f_{00} + u f_{10} \right ) + (v) \left ( (1-u) f_{10} + u f_{11} \right )
\end{cases}$$
is not applicable to arbitrary quadrilaterals, as it assumes it to be a parallelogram, i.e. with
$$\begin{cases}
x_{11} = x_{10} + x_{01} - x_{00} \\
y_{11} = y_{10} + y_{01} - y_{00}
\end{cases}$$
Solving $x = x(u,v)$, $y = y(u,v)$ for $u$ and $v$ yields
$$\begin{cases}
A = x_{00} (y_{01} - y_{10}) + x_{01} (y_{10} - y_{00}) + x_{10} (y_{00} - y_{01}) \\
u = \frac{ (x_{01} - x_{00}) y - (y_{01} - y_{00}) x + x_{00} y_{01} - y_{00} x_{01} }{A} \\
v = \frac{ (x_{00} - x_{10}) y - (y_{00} - y_{10}) x - x_{00} y_{10} + y_{00} x_{10} }{A}
\end{cases}$$
where $$A = \left(\vec{p}_{10} - \vec{p}_{00}\right) \times \left(\vec{p}_{01} - \vec{p}_{00}\right)$$
where $\times$ signifies the 2D analog of vector cross product, so $\lvert A \rvert$ is the area of the parallelogram. Thus, exactly one solution exists for all non-degenerate parallelograms.
For the most common use case, a regular rectangular axis-aligned grid of samples $p_{ji}$, $0 \le j, i \in \mathbb{Z}$, we have
$$\begin{cases}
x = a_x + b_x i \\
y = a_y + b_y j
\end{cases}$$
with $b_x \ne 0$, $b_y \ne 0$, corresponding to interpolation parameters
$$\begin{cases}
i = \left\lfloor \frac{x - a_x}{b_x} \right\rfloor \\
j = \left\lfloor \frac{y - a_y}{b_y} \right\rfloor \\
u = \frac{x - a_x}{b_x} - i \\
v = \frac{y - a_y}{b_y} - j
\end{cases}$$
so that
$$p(x,y) = (1-v) \left ( (1-u) p_{j,i} + (u) p_{j,i+1} \right ) + (v) \left ( (1-u) p_{j+1,i} + (u) p_{j+1,i+1} \right )$$
To apply bilinear interpolation to an arbitrary quadrilateral, we need to use
$$\begin{cases}
x(u,v) = (1-u)(1-v) x_{00} + (u)(1-v) x_{10} + (1-u)(v) x_{01} + (u)(v) x_{11} \\
y(u,v) = (1-u)(1-v) y_{00} + (u)(1-v) y_{10} + (1-u)(v) y_{01} + (u)(v) y_{11} \\
f(u,v) = (1-u)(1-v) f_{00} + (u)(1-v) f_{10} + (1-u)(v) f_{01} + (u)(v) f_{11}
\end{cases}$$
In some cases it is sufficient to produce additional samples, for example so that each quadrilateral can be split into four sub-quadrilaterals, doubling the resolution. Then, we do not need to solve for $x$ and $y$, and only need to compute
$$\begin{array}{cc}
x\left(\frac{1}{2},0\right), & y\left(\frac{1}{2},0\right), & f\left(\frac{1}{2},0\right) \\
x\left(\frac{1}{2},1\right), & y\left(\frac{1}{2},1\right), & f\left(\frac{1}{2},1\right) \\
x\left(0,\frac{1}{2}\right), & y\left(0,\frac{1}{2}\right), & f\left(0,\frac{1}{2}\right) \\
x\left(1,\frac{1}{2}\right), & y\left(1,\frac{1}{2}\right), & f\left(1,\frac{1}{2}\right)
\end{array}$$
However, solving $(u,v)$ for some specific $(x,y)$ is quite complicated. Indeed, I was surprised how complicated it turns out to be! (I apologize for misrepresenting this case as "easy" in a previous edit. Mea culpa.)
In practice, we first try to solve $u$ or $v$, and then the other by substituting into one of the equations above. If we decide we wish to solve $u$ first, we need to solve
$$U_2 u^2 + U_1 u + U_0 = 0$$
where
$$\begin{cases}
U_2 = (y_{00}-y_{01}) (x_{10}-x_{11}) - (x_{00}-x_{01}) (y_{10}-y_{11}) \\
U_1 = (y_{00}-y_{01}-y_{10}+y_{11}) x - (x_{00}-x_{01}-x_{10}+x_{11}) y + (x_{11}-2 x_{10}) y_{00} + (2 x_{00}-x_{01}) y_{10} + y_{01} x_{10} - y_{11} x_{00} \\
U_0 = (y_{10}-y_{00}) x - (x_{10}-x_{00}) y + y_{00} x_{10} - x_{00} y_{10}
\end{cases}$$
The possible solutions are
$$\begin{cases}
u = \frac{-U_1 \pm \sqrt{ U_1^2 - 4 U_2 U_0}}{2 U_2}, & U_2 \ne 0 \\
u = \frac{-U_0}{U_1}, & U_2 = 0, U_1 \ne 0 \\
u = 0, & U_2 = 0, U_0 = 0
\end{cases}$$
If we find $0 \le u \le 1$, we solve for $v$ by substituting into $X(u,v) = x$,
$$v = \frac{ (y_{00} - y_{10}) u + y - y_{00} }{ (y_{00} - y_{01} - y_{10} + y_{11}) u - y_{00} + y_{01} }$$
or into $Y(u,v) = y$,
$$v = \frac{ (x_{00} - x_{10}) u + x - x_{00} }{ (x_{00} - x_{01} - x_{10} + x_{11}) u - x_{00} + x_{01} }$$
If we find no solutions, we try to solve for $v$ in
$$V_2 v^2 + V_1 v + V_0 = 0$$
where
$$\begin{cases}
V_2 = (x_{00}-x_{01}) (y_{10}-y_{11}) - (y_{00}-y_{01}) (x_{10}-x_{11}) \\
V_1 = (x_{00}-x_{01}-x_{10}+x_{11}) y - (y_{00}-y_{01}-y_{10}+y_{11}) x + (y_{11}-2 y_{10}) x_{00} + (2 y_{00}-y_{01}) x_{10} + x_{01} y_{10} - x_{11} y_{00} \\
V_0 = (x_{10}-x_{00}) y - (y_{10}-y_{00}) x + x_{00} y_{10} - y_{00} x_{10}
\end{cases}$$
The possible solutions are similar to those for $u$:
$$\begin{cases}
v = \frac{-V_1 \pm \sqrt{ V_1^2 - 4 V_2 V_0}}{2 V_2}, & V_2 \ne 0 \\
v = \frac{-V_0}{V_1}, & V_2 = 0, V_1 \ne 0 \\
v = 0, & V_2 = 0, V_0 = 0
\end{cases}$$
If you find $0 \le v \le 1$, you solve for $u$ by substituting into $X(u,v) = x$,
$$u = \frac{(x_{00} - x_{01}) v + x - x_{00} }{ (x_{00} - x_{01} - x_{10} + x_{11}) v - x_{00} + x_{10} }$$
or into $Y(u,v) = y$,
$$u = \frac{(y_{00} - y_{01}) v + y - y_{00} }{ (y_{00} - y_{01} - y_{10} + y_{11}) v - y_{00} + y_{10} }$$
It is also possible to solve $(u,v)$ numerically, by calculating $X(u,v)$ and $Y(u,v)$ repeatedly with different $u$, $v$, until $\lvert X(u,v) - x \rvert \le \epsilon$ and $\lvert Y(u,v) - y \rvert \le \epsilon$, where $\epsilon$ is the maximum acceptable error in $x$ and $y$ (maximum distance to correct $(x,y)$ being $\sqrt{2}\epsilon$).
There are a number of different methods for the numerical search. Some of the following observations may come in handy, when implementing a numerical search:
$$\begin{array}{rl}
\frac{d \, X(u,v)}{d\,u} = & x_{10} - x_{00} + v ( x_{11} - x_{01} - x_{10} + x_{00} ) \\
\frac{d \, X(u,v)}{d\,v} = & x_{01} - x_{00} + u ( x_{11} - x_{01} - x_{10} + x_{00} ) \\
\frac{d \, Y(u,v)}{d\,u} = & y_{10} - y_{00} + v ( y_{11} - y_{01} - y_{10} + y_{00} ) \\
\frac{d \, Y(u,v)}{d\,v} = & y_{01} - y_{00} + u ( y_{11} - y_{01} - y_{10} + y_{00} ) \\
X(u + du, v) - X(u, v) = & du \left ( x_{10} - x_{00} + v ( x_{11} - x_{01} - x_{10} + x_{00} ) \right ) \\
X(u, v + dv) - X(u, v) = & dv \left ( x_{01} - x_{00} + u ( x_{11} - x_{01} - x_{10} + x_{00} ) \right ) \\
Y(u + du, v) - Y(u, v) = & du \left ( y_{10} - y_{00} + v ( y_{11} - y_{01} - y_{10} + y_{00} ) \right ) \\
Y(u, v + dv) - Y(u, v) = & dv \left ( y_{01} - y_{00} + u ( y_{11} - y_{01} - y_{10} + y_{00} ) \right )
\end{array}$$
In other words, it is true that the bilinear interpolation is quite difficult for arbitrary quadrilaterals, and very problematic for self-intersecting quadrilaterals. However, the most common quadrilateral types -- rectangles and parallelograms -- are easy, and even the general case is solvable at least numerically, even in the presence of singularities.
Why bilinear interpolation with quadrilaterals?
As I've shown above, for the rectangles and parallelograms -- the only quadrilaterals I've used bilinear interpolation with in real solutions --, bilinear interpolation is easy and simple.
Indeed, the emphasis on quadrilaterals (in the sense of arbitrary quadrilaterals) seems incorrect, as bilinear interpolation is mostly used with rectangles or parallelograms.
Perhaps the emphasis should be on that bilinear interpolation uses two variables to interpolate between four known values; or more generally, $k$-linear interpolation uses $k$ variables to interpolate between $2^k$ values. Trilinear interpolation is similarly common for cuboids with vertices
$$\begin{cases}
\vec{p}_{011} = \vec{p}_{010} + \vec{p}_{001} - \vec{p}_{000} \\
\vec{p}_{101} = \vec{p}_{100} + \vec{p}_{001} - \vec{p}_{000} \\
\vec{p}_{110} = \vec{p}_{100} + \vec{p}_{010} - \vec{p}_{000} \\
\vec{p}_{111} = \vec{p}_{100} + \vec{p}_{010} + \vec{p}_{001} - 2 \vec{p}_{000}
\end{cases}$$
i.e. cuboids defined by one vertex and three edge vectors.
Regular grids are ubiquitous, and linear mapping is the simplest interpolation method, with easy properties. Cubic interpolation and other interpolation methods do produce better results, but are computationally more expensive, and the properties may produce unwanted behaviour: most typically, the interpolated value is no longer guaranteed to reside within the range spanned by the constants.
Best Answer
In practice, raytracers tend to decompose surfaces into triangular meshes first, because the math needed is so much simpler.
However, you asked how to calculate the intersection between a line and a quadrilateral in 3D, so here goes.
Let's define your ray using an unit vector $\hat{n}$ (unit referring to unit length, $\lvert\hat{n}\rvert=1$) that passes through some point $\vec{p}_0$, $$\vec{p}_{RAY}(t) = \vec{p}_0 + t\,\hat{n}$$
The quadrilateral is a bit more complicated. Let's say the four corners are $\vec{p}_1$, $\vec{p}_2$, $\vec{p}_3$, and $\vec{p}_1$, with $\vec{p}_1$ and $\vec{p}_4$ diagonally across from each others. We can trace the surface using two variables, $0 \le u, v \le 1$, where $u=0,v=0$ refers to $\vec{p}_1$, $u=1,v=0$ to $\vec{p}_2$, $u=0,v=1$ to $\vec{p}_3$, and $u=1,v=1$ to $\vec{p}_4$, using bilinear interpolation of the coordinates: $$\vec{p}_{QUAD}(u,v) = \left ( \vec{p}_1 (1-u) + u \vec{p}_2 \right ) (1-v) + v \left ( \vec{p}_3 (1-u) + u \vec{p}_4 \right )$$ which is, after rearranging the terms, the same as $$\vec{p}_{QUAD}(u,v) = \vec{p}_1 + u \; v \; ( \vec{p}_4 - \vec{p}_3 - \vec{p}_2 + \vec{p}_1 ) + u \; ( \vec{p}_2 - \vec{p}_1 ) + v \; ( \vec{p}_3 - \vec{p}_1 )$$
The intersection is, of course, $$\vec{p}_{RAY}(t) = \vec{p}_{QUAD}(u,v)$$ In three dimensions, that is actually three equations with three unknowns, $$\begin{cases} x_0 + t n_x = x_1 + u \; v \; ( x_4 - x_3 - x_2 + x_1 ) + u \; ( x_2 - x_1 ) + v \; ( x_3 - x_1 ) \\ y_0 + t n_y = y_1 + u \; v \; ( y_4 - y_3 - y_2 + y_1 ) + u \; ( y_2 - y_1 ) + v \; ( y_3 - y_1 ) \\ z_0 + t n_z = z_1 + u \; v \; ( z_4 - z_3 - z_2 + z_1 ) + u \; ( z_2 - z_1 ) + v \; ( z_3 - z_1 ) \end{cases}$$ This can be solved, but the solution contains dozens of terms, and is therefore terribly slow to compute. (I asked Maple for the exact solution. There are two (i.e., $(t_1,u_1,v_1)$ and $(t_2,u_2,v_2)$, but as they don't fit in one screenful, I decided they are way too long to reproduce here.)
Biquadratic and bicubic Bézier surfaces can be solved the exact same way, it's just that there are more terms (up to $u^3$ and $v^3$, and 9 (biquadratic) 16 (bicubic) coordinates per dimension), and thus the result is even more complicated and slow to compute.