Here I'd change $\alpha \in \mathbb{R}^p, \ \beta \in \mathbb{R}^k$ just for notation coherence.
One way to transform this problem into an ordinary least squares (OLS) is:
We have our main problem with Hadamard product:
$$
\begin{equation}
\text{argmin}_{\alpha,\beta}\|y - \bar{y} \|_2^2, \ \ \ (1)
\end{equation}
$$
were $\bar{y} = (A\alpha \odot B\beta)$ is our approximation vector.
We can rewrite each row of our approximation vectos as:
$$
\bar{y}_i = \left( A_i \alpha \right)\left(B_i \beta\right),
$$
where $A_i$ and $B_i$ denotes the $i^\text{th}$ row of the corfesponding matrix.
Just changing the notation, we have
$$
\begin{align}
\bar{y}_i & = \left(\begin{array}{l l l}
a_{i,1} B_i, \ldots, a_{i,p} B_i
\end{array}
\right)
\left(
\begin{array}{}
\alpha_1 \beta \\
\vdots \\
\alpha_p \beta
\end{array}
\right),
\end{align}
$$
$$
(A\alpha \odot B\beta)_i = (A_i \otimes B_i)(\alpha \otimes \beta),
$$
where $\otimes$ denotes the kronecker product.
Using this equality we can define our new design matrix as $ H_i = (A_i \otimes B_i )$, and solution vector as $x = (\alpha \otimes \beta)$, then we can solve the OLS:
$$
\text{argmin}_{x}\|y - Hx\|_2^2,
$$
finding our global solution, due the convexity in $x$.
In order to retrieve the solution vectors, can rewrite $x$ as a matix, for instance, product of our solution vectors $\alpha$ and $\beta$,
$$
X = \left( \begin{array}{lcr}
\alpha_1 \beta_1 & \ldots & \alpha_1 \beta_k \\
\alpha_2 \beta_1 & \ldots & \alpha_2 \beta_k\\
\vdots & \ddots & \vdots \\
\alpha_p \beta_1 & \ldots & \alpha_p \beta_k
\end{array}
\right) = \alpha \beta^T
$$
We use the singular value decomposotion (SVD) to represent our matrix, as
$$
X=U\Sigma V^T,
$$
as $Rank(X)=1$, then using the first left and right singular vectors $u$ and $v$, and the first singular value $\sigma$,
$$
\bar{\alpha} = \gamma\sqrt\sigma u, \ \ \text{and} \ \ \bar{\beta} = \frac{1}{\gamma}\sqrt\sigma v,
$$
where $\gamma$ depends of the scale of $\alpha$ and $\beta$, and also the bias, $|\gamma|>0$.
The vectors solutions are scale sensitive, so the solution is not unique and additional assumptions should be made.
Hope this will be usefull
Matrix Computations
Golub and Van Loan, 3e
$\S$ 5.7.1, p. 270
Comparing flop counts for operations on $n\times n$ matrices:
$\frac{2}{3}n^{3}\ $ $\qquad$ Gaussian elimination
$\frac{4}{3}n^{3}\ $ $\qquad$ Householder orthogonalization
$2n^{3}$ $\qquad \ \ $ Modified Gram-Schmidt
$\frac{8}{3}n^{3}\ $ $\qquad$ Bidiagonalization
$12n^{3}$ $\qquad$ Singular Value Decomposition
Three reasons to choose orthogonalization to solve square systems:
Flop counts exaggerate the Gaussian elimination advantage. When memory traffic and vectorization overheads are considered the $\mathbf{Q}\mathbf{R}$ factorization is comparable in efficiency.
Orthogonalization methods have guaranteed stability, there is no "growth factor" to worry about as in Gaussian elimination.
In cases of ill-conditioning, the orthogonalization methods give an added measure of reliability. $\mathbf{Q}\mathbf{R}$ with condition estimate is very dependable and, of course, SVD is unsurpassed when it comes to producing a meaningful solution to a nearly singular system.
Best Answer
Yes, you can use a Cholesky factorization to solve this, by solving the so-called normal equations (that you offer above) directly. First, you must form the $n\times n$ matrix $Z=X^TWX$, then compute its Cholesky factorization $R^TR=Z$, where $R$ is upper triangular. Then $\beta=R^{-1}R^{-T}X^TWY$.
This is generally frowned upon, due to the lower numerical stability compared to other approaches, but it is often the cheapest, and if the normal equations are well-conditioned, it's fine. Indeed, many convex optimization methods use a normal equations approach within their iterative algorithms.
The most commonly offered "stable" method for least squares uses the QR factorization of $W^{1/2}X$, where $W^{1/2}$ satisfies $W^{1/2}W^{1/2}=W$. Since $W$ is diagonal, this is easy to form (in fact, you never need $W$ again if you stick with $W^{1/2}$). Note that $R$ here is the same as the Cholesky factor above, but this is a more stable, albeit more expensive, way to compute it. So again we have $\beta=R^{-1}R^{-T}X^TWY$. But since $X^T=R^TQ^TW^{-1/2}$, this simplifies to $\beta=R^{-1}Q^TW^{1/2}Y$.
For even more numerical stability, particularly if $W$ has a high condition number, you may want to use column pivoting; that is, $QRP^T=XW^{1/2}$, where $P$ is a permutation matrix chosen using the standard column pivoting process. Then $\beta=PR^{-1}Q^TW^{1/2}Y$.
One way to improve the accuracy of the results is with iterative refinement (look it up!). This is particularly useful for the normal equations approach, but it can be employed in all three options above. It is inexpensive compared to the initial solve step, since it can re-use the Cholesky or QR factorization.
I do not believe there is an easy way to reuse a factorization of $X$ or $Y$ to speed things up.