First, you have an error in your question; you mean that the prior desntiy of $\theta$ is Galenshore with hyperparameters $\alpha$ and $\beta$, not $\theta$ and $\beta$.
Next, ignore all factors that are not functions of the posterior parameter $\theta$. That is to say, look only at the kernel: $$f(\boldsymbol y \mid \theta) = f(y_1, \dots, y_n \mid \theta) = \prod_{i=1}^n \frac{2}{\Gamma(a)} \theta^{2a} y_i^{2a-1} e^{-\theta^2 y_i^2} \propto (\theta^{2a})^n e^{-\theta^2 \sum y_i^2}.$$ We don't care about $2/\Gamma(a)$ or $y_i^{2a-1}$ because these factors are not functions of $\theta$. Similarly, $$f(\theta \mid \alpha, \beta) = \frac{2}{\Gamma(\alpha)} \beta^{2\alpha} \theta^{2\alpha-1} e^{-\beta^2 \theta^2} \propto \theta^{2\alpha - 1} e^{-\beta^2 \theta^2}.$$ Consequently,
$$\begin{align*}
f(\theta \mid \boldsymbol y) &\propto f(\boldsymbol y \mid \theta)f(\theta \mid \alpha,\beta) \\
&\propto (\theta^{2a})^n e^{-\theta^2 \sum y_i^2} \theta^{2\alpha - 1} e^{-\beta^2 \theta^2} \\
&= \theta^{2(na+\alpha)-1} e^{-\theta^2 (\beta^2 + \sum y_i^2)}
\end{align*}$$
which is the kernel of a density from the same parametric family with posterior parameters $na + \alpha$ and $\sqrt{\beta^2 + \sum_{i=1}^n y_i^2}$.
I suggest you to use an advise to take $\theta =\lambda^k$ (like Lee David Chung Lin said)
but if you assist to use your notation I introduce you one conjugate prior.(you should check identifiability of your model).
$$f(x_1,\cdots,x_n|\lambda)\propto \frac{1}{\lambda^{nk}}
e^{-\frac{1}{\lambda^k}\sum x_{i}^{k}}$$
if we choose a prior $g(\lambda)$ then
$g(\lambda|x_1,\cdots ,x_n)\propto \frac{1}{\lambda^k}
e^{-\frac{1}{\lambda^k}\sum x_{i}^{k}} g(\lambda)$
so conjugate prior should be something like
$g(\lambda)\propto \frac{1}{\lambda^a} e^{-\frac{b}{\lambda^k}}
\hspace{.5cm} \lambda>0$
I found :
$$\int_{0}^{\infty} y^m e^{-b y^k} dy=\frac{\Gamma(\frac{m+1}{k})}{kb^{\frac{m+1}{k}}}$$
(reference: Table of Integrals, Series, and Products, I.S. Gradshteyn and I.M. Ryzhik, page 337)
by choosing $\lambda=1/y$ in this integral
$$\int_{0}^{\infty} \frac{1}{\lambda^{m+2}} e^{-b \frac{1}{\lambda^{k}}} d\lambda=\frac{\Gamma(\frac{m+1}{k})}{cb^{\frac{m+1}{k}}}$$
so you can create a distribution
$$g(\lambda)=NEWG(m,b)= \frac{\frac{1}{\lambda^{m+2}} e^{-b \frac{1}{\lambda^{k}}} }{\frac{\Gamma(\frac{m+1}{k})}{kb^{\frac{m+1}{k}}}} \hspace{.5cm} \lambda>0$$
so posterior will be
$$g(\lambda|x_1,\cdots ,x_n) \propto
\frac{1}{\lambda^{m+nk+2}} e^{-(b+\sum x_{i}^{k}) \frac{1}{\lambda^{k}}} $$
that is $NEWG(m+nk,b+\sum x_{i}^{k})$
another option is reparmetrize $m_2=mk$.
If you use standard notation $\theta$
$$f(x|\theta)=\frac{kx^{k-1}}{\theta} e^{-\frac{x^k}{\theta}}$$
using inverse gamma for prior
$$g(\theta)\propto \frac{1}{\theta^{\alpha+1}} e^{-\frac{\beta}{\theta}}$$
posterior:
$$g(\theta|x_1,\cdots,x_n)\propto f(x_1,\cdots,x_n|\theta) g(\theta)
=\frac{k^n (\prod_{i=1}^{n} x_i)^{k-1}}{\theta^n} e^{-\frac{\sum_{i=1}^{n} x_i^k}{\theta}} \frac{1}{\theta^{\alpha+1}} e^{-\frac{\beta}{\theta}}$$
$$\propto \frac{1}{\theta^{n+\alpha+1}} e^{-\frac{\beta+\sum_{i=1}^{n} x_i^k}{\theta}}$$
that is Inverse gamma with $(n+\alpha,\beta+\sum_{i=1}^{n} x_i^k)$
Best Answer
Ridge
The ridge estimator is defined as the solution to the following optimisation problem:
$$ \beta_R = \arg\min_\beta \|X\beta -y\|_2^2 + {\kappa\over 2}\|\beta\|_2^2 $$
i.e., the usual OLS loss (first term) with an L2 regularisation term and tuning "hyper"-parameter $\kappa\ge 0$. If $\kappa=0$ it's the OLS case. This problem is convex and differentiable, setting the gradient in $\beta$ to zero leads to
$$ X^T(X\beta_R -y) + \kappa \beta_R = 0 $$
which is equivalent to $(X^T X + \kappa I)\beta_R = X^Ty $ and for $\kappa>0$ the matrix on the LHS is positive definite (easy to prove) and can therefore be inverted leading the standard form:
$$\beta_R = (X^T X+\kappa I)^{-1} X^Ty. $$
Bayesian regression
The posterior in $\beta$ is proportional to the likelihood (gaussian) times the prior (gaussian):
$$ p(\beta | y) \propto p(y|\beta) \pi(\beta) $$
Gaussians x Gaussian is still Gaussian. So the posterior is a Gaussian with a mean $\beta_G$ and a covariance $\Sigma_G$.
I will show that $\beta_G = \beta_R$ from two different ways.
1. Via the MAP
We know the posterior is a Gaussian. So the mode=mean=maximum. Therefore we can write:
$$ \beta_G = \arg\max_\beta L(y;\beta)\pi(\beta) $$
where $L(y;\beta)=p(y|\beta)$ is the likelihood of the data $y$ if the gaussian generating process is parameterised by $\beta$ i.e.: $\mathcal N(y;X\beta, \sigma^2I)$. Correspondingly $\pi(\beta)$ is the prior or $\mathcal N(\beta;0,\tau^2 I)$.
Now the $\arg\min$ does not change under a monotonous positive transformation such as the $\log$. Also additive constants are irrelevant. Therefore, we can write
$$\beta_G = \arg\max_\beta \log L(\beta) + \log \pi(\beta)$$
removing additive constants, this leads to
$$\beta_G = \arg\max_\beta -{1\over 2\sigma^2}(y-X\beta)^T(y-X\beta) - {1\over 2\tau^2} \beta^T\beta $$
changing sign, this becomes a minimisation problem (see how we're coming close to Ridge?) rearrange a bit and you get (again, get rid of additive constants and constant multiplicative terms):
$$ \beta_G = \arg\min_\beta {1\over \sigma^2}\left(\beta^T X^TX\beta - 2\beta^T X^T y\right) + {1\over \tau^2} \beta^T\beta $$
massage a bit more (multiply objective by $\sigma^2$)
$$ \beta_G = \arg\min_\beta \beta^T\left(X^TX+{\sigma^2\over \tau^2}I\right)\beta - 2\beta^TX^T y $$
setting the gradient in $\beta$ to zero leads to
$$\left(X^TX + {\sigma^2\over \tau^2}I\right)\beta_G = X^T y $$
which concludes the proof (just identify $\kappa \leftrightarrow \sigma^2/\tau^2$).
2. Quadratic form matching
You have a product of two gaussians leading to a Gaussian. We can ignore the proportionality constants and the $\exp$ so that you're effectively trying to match quadratic forms:
Matching quadratic terms in $\beta$
therefore
$$\Sigma_G^{-1} = (X^TX/\sigma^2 + I/\tau^2) $$
Linear term in $\beta$
therefore
$$ \beta_G = \Sigma_G X^T y / \sigma^2 = (X^TX + \sigma^2/\tau^2)^{-1} X^T y $$
Side remark
The first approach corresponds to taking the MAP estimator (maximum a posteriori). MAP estimators are typically identical to regularised MLE estimators. In this case, the MLE estimator is the OLS and it's regularised with an L2 norm, this corresponds to a MAP with a Gaussian likelihood and Gaussian prior.
You have a similar thing with the Lasso for example which corresponds to the MAP of a Gaussian likelihood with a Laplace prior.