Rewritten on 2016-11-12. The OP raised very good questions in the comments. Note that this is not intended as an exhaustive answer (as one might expect from, say, a mathematician?), but more like observations from someone who routinely uses bilinear interpolation for numerical data.
How can a bilinear interpolation be defined for an arbitrary quadrilateral,
i.e. without running into singularities?
Bilinear interpolation is usually defined as
$$f(u,v) = (1-u) (1-v) F_{00} + u (1 - v) F_{01} + (1-u) v F_{10} + u v F_{11}$$
where $0 \le u, v \le 1$ and
$$\begin{array}{}
f(0,0) = F_{00} \\
f(0,1) = F_{01} \\
f(1,0) = F_{10} \\
f(1,1) = F_{11} \\
f(\frac{1}{2},\frac{1}{2}) = \frac{F_{00}+F_{01}+F_{10}+F_{11}}{4}
\end{array}$$
If we use notation
$$p(t; p_0, p_1) = (1-t) p_0 + t p_1 = p_0 + t (p_1 - p_0)$$
for the simplest form of linear interpolation, with $0 \le t \le 1$, $p(0;p_0,p_1) = p_0$, $p(1;p_0,p_1) = p_1$, then bilinear interpolation can be written as
$$f(u,v) = p(u; p(v; F_{00}, F_{01}), p(v; F_{10}, F_{11}))$$
so this simply extends the single-variable linear interpolation to two variables and $2^2 = 4$ samples.
Bilinear interpolation is not often used for arbitrary quadrilaterals. After pondering the questions OP posed in the comments, I realized that the typical form used for interpolation,
$$\begin{cases}
x(u,v) = x_{00} + u ( x_{10} - x_{00}) + v ( x_{01} - x_{00} ) \\
y(u,v) = y_{00} + u ( y_{10} - y_{00}) + v ( y_{01} - y_{00} ) \\
f(u,v) = (1-v) \left ( (1-u) f_{00} + u f_{10} \right ) + (v) \left ( (1-u) f_{10} + u f_{11} \right )
\end{cases}$$
is not applicable to arbitrary quadrilaterals, as it assumes it to be a parallelogram, i.e. with
$$\begin{cases}
x_{11} = x_{10} + x_{01} - x_{00} \\
y_{11} = y_{10} + y_{01} - y_{00}
\end{cases}$$
Solving $x = x(u,v)$, $y = y(u,v)$ for $u$ and $v$ yields
$$\begin{cases}
A = x_{00} (y_{01} - y_{10}) + x_{01} (y_{10} - y_{00}) + x_{10} (y_{00} - y_{01}) \\
u = \frac{ (x_{01} - x_{00}) y - (y_{01} - y_{00}) x + x_{00} y_{01} - y_{00} x_{01} }{A} \\
v = \frac{ (x_{00} - x_{10}) y - (y_{00} - y_{10}) x - x_{00} y_{10} + y_{00} x_{10} }{A}
\end{cases}$$
where $$A = \left(\vec{p}_{10} - \vec{p}_{00}\right) \times \left(\vec{p}_{01} - \vec{p}_{00}\right)$$
where $\times$ signifies the 2D analog of vector cross product, so $\lvert A \rvert$ is the area of the parallelogram. Thus, exactly one solution exists for all non-degenerate parallelograms.
For the most common use case, a regular rectangular axis-aligned grid of samples $p_{ji}$, $0 \le j, i \in \mathbb{Z}$, we have
$$\begin{cases}
x = a_x + b_x i \\
y = a_y + b_y j
\end{cases}$$
with $b_x \ne 0$, $b_y \ne 0$, corresponding to interpolation parameters
$$\begin{cases}
i = \left\lfloor \frac{x - a_x}{b_x} \right\rfloor \\
j = \left\lfloor \frac{y - a_y}{b_y} \right\rfloor \\
u = \frac{x - a_x}{b_x} - i \\
v = \frac{y - a_y}{b_y} - j
\end{cases}$$
so that
$$p(x,y) = (1-v) \left ( (1-u) p_{j,i} + (u) p_{j,i+1} \right ) + (v) \left ( (1-u) p_{j+1,i} + (u) p_{j+1,i+1} \right )$$
To apply bilinear interpolation to an arbitrary quadrilateral, we need to use
$$\begin{cases}
x(u,v) = (1-u)(1-v) x_{00} + (u)(1-v) x_{10} + (1-u)(v) x_{01} + (u)(v) x_{11} \\
y(u,v) = (1-u)(1-v) y_{00} + (u)(1-v) y_{10} + (1-u)(v) y_{01} + (u)(v) y_{11} \\
f(u,v) = (1-u)(1-v) f_{00} + (u)(1-v) f_{10} + (1-u)(v) f_{01} + (u)(v) f_{11}
\end{cases}$$
In some cases it is sufficient to produce additional samples, for example so that each quadrilateral can be split into four sub-quadrilaterals, doubling the resolution. Then, we do not need to solve for $x$ and $y$, and only need to compute
$$\begin{array}{cc}
x\left(\frac{1}{2},0\right), & y\left(\frac{1}{2},0\right), & f\left(\frac{1}{2},0\right) \\
x\left(\frac{1}{2},1\right), & y\left(\frac{1}{2},1\right), & f\left(\frac{1}{2},1\right) \\
x\left(0,\frac{1}{2}\right), & y\left(0,\frac{1}{2}\right), & f\left(0,\frac{1}{2}\right) \\
x\left(1,\frac{1}{2}\right), & y\left(1,\frac{1}{2}\right), & f\left(1,\frac{1}{2}\right)
\end{array}$$
However, solving $(u,v)$ for some specific $(x,y)$ is quite complicated. Indeed, I was surprised how complicated it turns out to be! (I apologize for misrepresenting this case as "easy" in a previous edit. Mea culpa.)
In practice, we first try to solve $u$ or $v$, and then the other by substituting into one of the equations above. If we decide we wish to solve $u$ first, we need to solve
$$U_2 u^2 + U_1 u + U_0 = 0$$
where
$$\begin{cases}
U_2 = (y_{00}-y_{01}) (x_{10}-x_{11}) - (x_{00}-x_{01}) (y_{10}-y_{11}) \\
U_1 = (y_{00}-y_{01}-y_{10}+y_{11}) x - (x_{00}-x_{01}-x_{10}+x_{11}) y + (x_{11}-2 x_{10}) y_{00} + (2 x_{00}-x_{01}) y_{10} + y_{01} x_{10} - y_{11} x_{00} \\
U_0 = (y_{10}-y_{00}) x - (x_{10}-x_{00}) y + y_{00} x_{10} - x_{00} y_{10}
\end{cases}$$
The possible solutions are
$$\begin{cases}
u = \frac{-U_1 \pm \sqrt{ U_1^2 - 4 U_2 U_0}}{2 U_2}, & U_2 \ne 0 \\
u = \frac{-U_0}{U_1}, & U_2 = 0, U_1 \ne 0 \\
u = 0, & U_2 = 0, U_0 = 0
\end{cases}$$
If we find $0 \le u \le 1$, we solve for $v$ by substituting into $X(u,v) = x$,
$$v = \frac{ (y_{00} - y_{10}) u + y - y_{00} }{ (y_{00} - y_{01} - y_{10} + y_{11}) u - y_{00} + y_{01} }$$
or into $Y(u,v) = y$,
$$v = \frac{ (x_{00} - x_{10}) u + x - x_{00} }{ (x_{00} - x_{01} - x_{10} + x_{11}) u - x_{00} + x_{01} }$$
If we find no solutions, we try to solve for $v$ in
$$V_2 v^2 + V_1 v + V_0 = 0$$
where
$$\begin{cases}
V_2 = (x_{00}-x_{01}) (y_{10}-y_{11}) - (y_{00}-y_{01}) (x_{10}-x_{11}) \\
V_1 = (x_{00}-x_{01}-x_{10}+x_{11}) y - (y_{00}-y_{01}-y_{10}+y_{11}) x + (y_{11}-2 y_{10}) x_{00} + (2 y_{00}-y_{01}) x_{10} + x_{01} y_{10} - x_{11} y_{00} \\
V_0 = (x_{10}-x_{00}) y - (y_{10}-y_{00}) x + x_{00} y_{10} - y_{00} x_{10}
\end{cases}$$
The possible solutions are similar to those for $u$:
$$\begin{cases}
v = \frac{-V_1 \pm \sqrt{ V_1^2 - 4 V_2 V_0}}{2 V_2}, & V_2 \ne 0 \\
v = \frac{-V_0}{V_1}, & V_2 = 0, V_1 \ne 0 \\
v = 0, & V_2 = 0, V_0 = 0
\end{cases}$$
If you find $0 \le v \le 1$, you solve for $u$ by substituting into $X(u,v) = x$,
$$u = \frac{(x_{00} - x_{01}) v + x - x_{00} }{ (x_{00} - x_{01} - x_{10} + x_{11}) v - x_{00} + x_{10} }$$
or into $Y(u,v) = y$,
$$u = \frac{(y_{00} - y_{01}) v + y - y_{00} }{ (y_{00} - y_{01} - y_{10} + y_{11}) v - y_{00} + y_{10} }$$
It is also possible to solve $(u,v)$ numerically, by calculating $X(u,v)$ and $Y(u,v)$ repeatedly with different $u$, $v$, until $\lvert X(u,v) - x \rvert \le \epsilon$ and $\lvert Y(u,v) - y \rvert \le \epsilon$, where $\epsilon$ is the maximum acceptable error in $x$ and $y$ (maximum distance to correct $(x,y)$ being $\sqrt{2}\epsilon$).
There are a number of different methods for the numerical search. Some of the following observations may come in handy, when implementing a numerical search:
$$\begin{array}{rl}
\frac{d \, X(u,v)}{d\,u} = & x_{10} - x_{00} + v ( x_{11} - x_{01} - x_{10} + x_{00} ) \\
\frac{d \, X(u,v)}{d\,v} = & x_{01} - x_{00} + u ( x_{11} - x_{01} - x_{10} + x_{00} ) \\
\frac{d \, Y(u,v)}{d\,u} = & y_{10} - y_{00} + v ( y_{11} - y_{01} - y_{10} + y_{00} ) \\
\frac{d \, Y(u,v)}{d\,v} = & y_{01} - y_{00} + u ( y_{11} - y_{01} - y_{10} + y_{00} ) \\
X(u + du, v) - X(u, v) = & du \left ( x_{10} - x_{00} + v ( x_{11} - x_{01} - x_{10} + x_{00} ) \right ) \\
X(u, v + dv) - X(u, v) = & dv \left ( x_{01} - x_{00} + u ( x_{11} - x_{01} - x_{10} + x_{00} ) \right ) \\
Y(u + du, v) - Y(u, v) = & du \left ( y_{10} - y_{00} + v ( y_{11} - y_{01} - y_{10} + y_{00} ) \right ) \\
Y(u, v + dv) - Y(u, v) = & dv \left ( y_{01} - y_{00} + u ( y_{11} - y_{01} - y_{10} + y_{00} ) \right )
\end{array}$$
In other words, it is true that the bilinear interpolation is quite difficult for arbitrary quadrilaterals, and very problematic for self-intersecting quadrilaterals. However, the most common quadrilateral types -- rectangles and parallelograms -- are easy, and even the general case is solvable at least numerically, even in the presence of singularities.
Why bilinear interpolation with quadrilaterals?
As I've shown above, for the rectangles and parallelograms -- the only quadrilaterals I've used bilinear interpolation with in real solutions --, bilinear interpolation is easy and simple.
Indeed, the emphasis on quadrilaterals (in the sense of arbitrary quadrilaterals) seems incorrect, as bilinear interpolation is mostly used with rectangles or parallelograms.
Perhaps the emphasis should be on that bilinear interpolation uses two variables to interpolate between four known values; or more generally, $k$-linear interpolation uses $k$ variables to interpolate between $2^k$ values. Trilinear interpolation is similarly common for cuboids with vertices
$$\begin{cases}
\vec{p}_{011} = \vec{p}_{010} + \vec{p}_{001} - \vec{p}_{000} \\
\vec{p}_{101} = \vec{p}_{100} + \vec{p}_{001} - \vec{p}_{000} \\
\vec{p}_{110} = \vec{p}_{100} + \vec{p}_{010} - \vec{p}_{000} \\
\vec{p}_{111} = \vec{p}_{100} + \vec{p}_{010} + \vec{p}_{001} - 2 \vec{p}_{000}
\end{cases}$$
i.e. cuboids defined by one vertex and three edge vectors.
Regular grids are ubiquitous, and linear mapping is the simplest interpolation method, with easy properties. Cubic interpolation and other interpolation methods do produce better results, but are computationally more expensive, and the properties may produce unwanted behaviour: most typically, the interpolated value is no longer guaranteed to reside within the range spanned by the constants.
Your reasoning in (1) is incorrect. It doesn't make sense to say "$T$ is independent of coordinates hence linear", because there is absolutely no link between the concepts of "coordinate-independence" (which is an intuitive term, but a little tough to make rigorous) and "linearity". Here's how to get the answer:
You're considering the function $T: \Bbb{R}^2 \to \Bbb{R}^2$, which doubles the distance of each point from the origin along the same direction. More precisely, given any point $(x,y)$ of the domain, $T(x,y)$ is supposed to be a point on the same line through the origin, but with double the distance,. Since $T(x,y)$ lies on the same line as $(x,y)$, there is some $\lambda \in \Bbb{R}$ such that
\begin{align}
T(x,y) = \lambda \cdot (x,y)
\end{align}
The additional condition you're imposing on $T$ is that it has to double the distance. This immediately implies that $\lambda^2 = 4$; since you want it to be in the same direction, you have to choose a positive $\lambda$; hence $\lambda = 2$. Thus, we have deduced that for all $(x,y) \in \Bbb{R}^2$,
\begin{align}
T(x,y) = 2 \cdot (x,y)
\end{align}
Or equivalently, $T= 2 I_{\Bbb{R}^2}$. Since the identity is linear, $T$ is also clearly linear. This implies that the matrix of $T$ with respect to standard basis is twice the identity matrix.
The above explanation tells you what and how to get the correct answer, but it doesn't say why your thought process and reasoning was wrong; I'll try to explain this now.
First of all, to keep these concepts straight in your mind, I think it is very misleading (for yourself) to write statements like "a linear transformation in polar coordinates" (it is clear what you mean of course, but I always have the impression that poor choice of words leads to self-confusion). A linear transformation is a linear transformation. That's it. By definition, it is a function $T$ between two vector spaces, which is linear; it doesn't matter what "coordinates" you use. It is something defined based solely on the vector space structure of the domain and target space.
The very first "misconception" I'd like to address is regarding notation.
From a purely logical perspective, just because you use the symbols $r$ and $\theta$, it doesn't automatically make things "polar coordinates", and just because you use the symbols $x,y$ it doesn't mean you're working in "cartesian coordinates". This should make sense, because Math doesn't care what symbols you use... Mathematics makes just as much sense if you write things in latin letters $x,y,z,a,b,c,i,j,k$ as it does in greek letters $\xi,\eta,\theta...$ or even if you decide to write everything as Chinese letters (of course it is tradition to use $x,y$ for cartesian and $r,\theta$ for polar. My point was purely logical, I'm not saying to upset hundreds of years of tradition).
Having said all that, here's how to make precise what you intended to say. Polar coordinates should be thought of as a function. Define the set $A = (0,\infty) \times (0,2\pi)$ and $B = \Bbb{R}^2 \setminus \{(x,0) \in \Bbb{R}^2 : \, x \geq 0\}$, and define the function $P:A \to B$ defined by
\begin{align}
P(r,\theta) = (r \cos \theta, r \sin \theta)
\end{align}
Here, $A$ is thought of as the "space of polar-coordinate parameters" and $B$ is thought of as the "space of cartesian-coordinate parameters", and you can check that the function $P$ is bijective, so it allows you to translate back and forth between the two spaces of parameters.
You made the statement that
In polar coordinates, this is described by $T(r, \theta) = (2r, \theta)$
This is understandable to me, but the more precise way of saying it is the following:
For every $(r,\theta) \in A$, the function $T: \Bbb{R}^2 \to \Bbb{R}^2$ is such that
\begin{align}
(P^{-1} \circ T \circ P)(r,\theta) = (2r, \theta)
\end{align}
Composing with $P^{-1}$ and $P$ is what allows you to state things in terms of polar coordinates. The way to "read" the formula above is that given a point $(r,\theta)$ in "polar-coordinate space", $P(r,\theta)$ gives you a point in the domain of $T$, then after applying $T$ to $P(r,\theta)$, we apply $P^{-1}$ to describe it back in terms of "polar coordinate space".
Now, there is always a problem with polar coordinates. The function $P$ that I defined above is clearly not linear (in general $P(2r,2\theta) \neq 2 P(r,\theta)$). A second issue with polar coordinates is that the domain and target space of the function $P$ is the set $A$ and $B$, which is not the entire $\Bbb{R}^2$.
Often times people are too lazy to write out the hidden compositions with $P$ and $P^{-1}$, so they just say "in polar coordinates $T(r,\theta) = \dots$" as opposed to the more precise statement "for every $(r,\theta) \in A$, $(P^{-1} \circ T \circ P)(r,\theta) = \dots$". So, my suggestion to you is to always define $T$ first. Then if you want to "descibe $T$ in another coordinate system" (for example cylindrical coordinates if you are in $3D$), then you need to compose with the appropiate coordinate transformation function.
Edit In response to Comment
Let's say $\alpha = (r\cos\theta, r\sin\theta)\in\Bbb{R}^2$ is the point of interest. Let $V:= T_{\alpha}\Bbb{R}^2$, this is the vector space we're interested in. The two bases $B_{\text{cart}}$ and $B_{\text{polar}}$ are both bases for the same vector space $V$, which implies there is a linear isomorphism $L:V\to V$ which maps one basis onto the other:
\begin{align}
\begin{cases}
L((e_1)_{\alpha}):= (\hat{r})_{\alpha} &= \cos\theta \cdot (e_1)_{\alpha} + \sin\theta \cdot (e_2)_{\alpha}\\
L((e_2)_{\alpha}):= (\hat{\theta})_{\alpha} &= -\sin\theta \cdot (e_1)_{\alpha} + \cos\theta \cdot (e_2)_{\alpha}
\end{cases}
\end{align}
As for an example of a non-linear function giving rise to linear approximations, consider the following example. In $1$-D, lets say $f(x) = e^{3\sin x}$. Then, $f'(x) = 3\cos(x) e^{3\sin x}$. Now, be careful, I'm not saying that the function $x\mapsto f'(x) = 3\cos(x) e^{3\sin x}$ is linear, of course it isn't, this is a highly non-linear function. Instead, what I mean is that if you pick a particular value for $x$, and consider the function $Df_x:\Bbb{R}\to \Bbb{R}$ defined as
\begin{align}
Df_x(h)&:= f'(x) \cdot h = (3\cos(x) e^{3\sin x})\cdot h
\end{align}
then this is certainly a linear function (of $h$). To make this even more concrete, let's say $x=0$. Then, $f'(0) = 3$, so $Df_0(h) = f'(0)\cdot h = 3h$.
In higher dimensions, we have a similar story. Let's consider $f:\Bbb{R}^2\to \Bbb{R}$ defined as $f(x,y) = x^2 + e^{xy}$. Then, If you've taken a course in multivariable calculus, you should know the Jacobian matrix at a point $(x,y)$ is
\begin{align}
f'(x,y) &=
\begin{bmatrix}
2x + ye^{xy} & x e^{xy}
\end{bmatrix}
\end{align}
In other words, the derivative (which at any given point is a linear transformation) is $Df_{(x,y)}:\Bbb{R}^2\to \Bbb{R}$, given by
\begin{align}
Df_{(x,y)}(h,k) &= (2x+ye^{xy})h + (e^{xy})k.
\end{align}
This is once again a linear function of $(h,k)$. Again, being even more concrete, observe that if $(x,y) = (1,1)$ then $Df_{(1,1)}(h,k)= (2+e)\cdot h +e\cdot k $, which is clearly linear (as a function of $(h,k)$).
Best Answer
You may be better off generating a parallelogram within a single quadrilateral. Simply connect the midpoints of the sides in rotational order. The parallelogram thus obtained has half the area of the original quadrilateral, regardless of the shape of the latter figure.
Divide and Conquer
How does that work? Let $Q$ be any quadrilateral and $P$ be the parallelogram defined by the midpoints of the sides of $Q$. Divide the plane in half along a diagonal of $Q$, choosing the diagonal through the concave vertex if such exists.
Then the area of $Q$ is divided into two triangular areas. Taking the base to lie along the dividing line, the area of $Q$ is then measured by half the length of the dividing diagonal tines the sum of the triangle altitudes.
Meanwhile the area of $P$ is divided into two parallelogram areas because the sides of $P$ are constructed parallel to the diagonals of $Q$, and the base of $P$ may also be rendered along the dividing line. Then this base measures half the dividing diagonal inside $Q$ and the corresponding altitude measures half the sum of the triangle altitudes, from which the area of $P$ works out to half the previously rendered area of $Q$.