Starting with the formulation of the ridge regression problem as
$
\min \| X\beta -y \|_{2}^{2} + \lambda \| \beta \|_{2}^{2}
$
you can write the problem as
$
\min \| A\beta - b \|_{2}^{2}
$
where
$
A=\left[
\begin{array}{c}
X \\
\sqrt{\lambda} I
\end{array}
\right]
$
and
$b=\left[
\begin{array}{c}
y \\
0
\end{array}
\right].
$
The matrix $A$ has full column rank because of the $\sqrt{\lambda}I$ part. Thus the least squares problem as a unique solution
$\hat{\beta}=(A^{T}A)^{-1}A^{T}b$
Writing this out in terms of $X$ and $y$, and simplifying lots of 0's out, we get
$\hat{\beta}=(X^{T}X+\lambda I)^{-1}X^{T}y$
Nothing in this derivation depends on whether $X$ has more rows or columns, or even on whether $X$ has full rank. This formula is thus applicable to the undetermined case.
It is an algebraic fact that for $\lambda>0$,
$(X^{T}X+\lambda I)^{-1}X^{T}=X^{T}(XX^{T}+\lambda I)^{-1} $
Thus we also have the option of using
$\hat{\beta}=X^{T}(XX^{T}+\lambda I)^{-1}y$.
To answer your specific questions:
Yes, both formulas work for the undetermined case as well as the over determined case. They also work if $\mbox{rank}(X)$ is less than the minimum of the number of rows and columns of $X$. The second version can be more efficient for problems that are undetermined because $XX^{T}$ is smaller than $X^{T}X$ in that case.
I'm not aware of any derivation of the alternative version of the formula that starts with some other damped least squares problem and uses the normal equations. In any case you can derive it in a straight forward fashion using a bit of algebra.
It's possible that you're thinking of the ridge regression problem in the form
$\min \| \beta \|_{2}^{2} $
subject to
$\| X\beta-y \|_{2}^{2} \leq \epsilon.$
However, this version of the ridge regression problem simply leads to the same damped least squares problem $\min \| X\beta -y \|_{2}^{2} + \lambda \| \beta \|_{2}^{2}$.
Assuming you have 5 levels in the location variable, $\beta_5$ is the mean difference of New York compared to the reference group (the one that was not included in this model) adjusting for room, bat, and square footage, so it should be interpreted as such. $\beta$s and their associated p-value within a single categorical variable can change if you change your reference group, so it's crucial to mention "compared to what."
Best Answer
if you take the first row of A, i.e. $A_{1} = [a11,a12,a13]$ and multiply it by design matrix X that consists of x's stacked next to each other $X=[x1,x2,...,x6] \in M_{3,6}$ you should be getting the 1st coordinate of the b's $B_1=[b11,b21,b31,...,b61] \in M_{1,6}$
$B_1 = A_{1} X + \epsilon$
so essentially you run linear regression of X onto B, the first coordinate of b's, to get the 1st row of A, second coordinate to get the 2nd row and third coordinate for the 3rd row
i hope this is making sense.
some warning: think hard about the error. is it additive and independent of the x's? if it is true for one of the coordinates how about the combination of the three? it seems non-trivial question but i believe my suggestion will give you a reasonable approximation