I'm not particularly familiar with DSGE models, but a standard maximum likelihood estimation would go like this:
first we can write your matrix equation as:
$$ Y_t = (1-A)^{-1}BY_{t-1} + (1-A)^{-1}E$$
such that we can be express the conditional distribution of $Y_t$ given $Y_{t-1}$ :
$$ Y_t | Y_{t-1} \sim \mathcal N ( \hat Y_t , \Sigma )$$
where $ \hat Y_t = (1-A)^{-1}BY_{t-1}$, and $\Sigma = \sigma^2 (1 - A)^{-1}(1 - A^T)^{-1} $ (assuming $\varepsilon \sim \mathcal N (0,\sigma^2)$
This follows from standard properties of the multivariate normal distribution.
The log-likelihood is therefore:
$$ \log\mathcal{L} = -\frac{1}{2}\sum_t (Y_t - \hat Y_t)^T\Sigma^{-1}(Y_t - \hat Y_t) - \frac{1}{2} N \log |\Sigma| $$
Now we can use the fact that $\Sigma^{-1} = (1 - A^T)(1 - A)/\sigma^2 $ to simplify some of the terms :
$$ \hat Y_t^T \Sigma^{-1} Y_t = Y_{t-1}^TB^T(1-A)Y_t = \pi_{t-1}\pi_t - \omega y_t y_{t-1}$$
$$ \hat Y_t^T \Sigma^{-1} \hat Y_t^T = Y_{t-1}B^TBY_{t-1} = y_{t-1}^2$$
Notice that we only have up to quadratic terms of $A$ from the exponential, so taking the derivative with respect to elements of $A$ from that part gives us something linear in $A$, e.g. :
$$ \frac{\partial}{\partial \theta} (\frac{1}{2} Y_t^T \Sigma^{-1} Y_t) = -\frac{1}{2\sigma^2} Y_t^T \left(\frac{dA^T}{d\theta}(1-A) + (1-A^T) \frac{dA}{d\theta}\right) Y_t$$
$$ = -\frac{1}{\sigma^2} \text{tr}\left( \frac{dA^T}{d\theta}(1-A) Y_t Y_t^T\right)$$
Notice that $\frac{dA^T}{d\theta}$ is a matrix that has just a single non zero entry at the location of $\theta$ , so we can write this more generally in a matrix notation
$$ \frac{\partial}{\partial A} (\frac{1}{2}\sum_t Y_t^T \Sigma^{-1} Y_t) = -\frac{1}{\sigma^2} (1 - A) \left(\sum_t Y_t Y_t^T \right) $$
(However keep in mind that this should be read as as equation for the non zero entries of $A$ only).
Now we are left to deal with the determinant part, for which we can take the derivative as follows :
$$ \frac{\partial}{\partial \theta} \log |\Sigma| = -\text{tr}(\frac{\partial \Sigma^{-1}}{\partial \theta} \Sigma) $$
plugging in the derivative of $\Sigma^{-1}$ we get
$$ \frac{\partial}{\partial \theta} \log |\Sigma| = \frac{1}{\sigma^2}\text{tr}\left(\frac{dA^T}{d\theta}(1-A^T)^{-1} + (1-A)^{-1} \frac{dA}{d\theta}\right) = \frac{2}{\sigma^2}\text{tr}\left( \frac{dA^T}{d\theta}(1-A^T)^{-1} \right) $$
and as before we can write it as
$$ \frac{\partial}{\partial A} \log |\Sigma| = \frac{2}{\sigma^2}(1-A^T)^{-1} $$
putting everything together :
$$ \sigma^2 \frac{\partial}{\partial A} \log \mathcal{L} = (1 - A) \left(\sum_t Y_t Y_t^T \right) - B \left(\sum_t Y_t Y_{t-1}^T \right) - N(1-A^T)^{-1}.$$
Equating this to zero gives the maximum likelihood equation for $A$, But the equation is non linear and therefore can't be solved directly (in fact it can be written as a quadratic, but I don't think that helps much).
Nevertheless it can be easily solved iteratively using gradient descent methods, with the above gradient. Note that the dependence on the data $Y$ is only via the two correlation matrices defined above, which you only have to calculate once. Also note that the MLE does not depend on the noise variance $\sigma^2$.
[Note: I can't guarantee that every (-) sign and $^T$ is correct, if you are going to use some gradient based optimization tool you should verify the result against a numerical gradient, or you could use a symbolic program to calculate the derivatives. If you want to work out the details of the derivation by yourself, the Matrix Cookbook is a useful reference.]
Best Answer
A common method is to use an exponentially weighted cost function: $$ \sum_i \lambda^{i} e(t-i)^2 $$ where $e(t)$ is the residual error, and $\lambda$ is the forgetting rate. If $\lambda=1$, you get back least squares regression.
You can use recursive least squares (RLS) to find a solution efficiently.