I think Question 1 and 2 are interconnected. First, the homogeneity of variance assumption comes from here, $\boldsymbol \epsilon \ \sim \ N(\mathbf{0, \sigma^2 I})$. But this assumption can be relaxed to more general variance structures, in which the homogeneity assumption is not necessary. That means it really depends on how the distribution of $\boldsymbol \epsilon$ is assumed.
Second, the conditional residuals are used to check the distribution of (thus any assumptions related to) $\boldsymbol \epsilon$, whereas the marginal residuals can be used to check the total variance structure.
AIC and BIC do not target minimizing the amount of autocorrelation in model residuals. Therefore, it is generally not surprising that a model selected by AIC or BIC has some autocorrelation.
The idea behind AIC and BIC is to select a model that describes the data quite well but not "too well", given the limited sample size. Trying to remove all autocorrelations may seem desirable, but if that increases model complexity to a large extent, then we may be overfitting. That is, in-sample results produced by a complicated model may look nice but once we collect more data, we will see that the model as estimated in the initial sample does not fit the new data well. In other words, the model does not generalize well. And we want it to generalize well because we are generally interested in the properties of the population and/or the yet unobserved samples from it. What AIC and BIC do is prevent overfitting and select a "golden middle" (in a certain sense, which differs for AIC and BIC).
Regarding the series being very long (over 10,000 obs.), you will get statistically significant results even when the effect size is very small. Which brings us to the question, are the autocorrelations economically significant? Or are they really tiny from an economic perspective but still statistically significant due to the large sample?
Regarding outliers, something should better be done about them, as otherwise they will be negatively affecting the modelling results. How to handle outliers is a whole separate topic, so I will not attempt to elaborate on that here; but I suggest taking some measures to deal with the outliers instead of neglecting them.
Best Answer
If you are only interested in one dependent variable, you may look at its equation alone. As Christoph Hanck correctly notes, the other equations of the model do not affect its estimation if you do equation-by-equation OLS (and that is a preferred method for an unrestricted VAR).
Choosing to look at and report only the encouraging results while ignoring the discouraging ones might do you a bear favour. To get reliable results, you should not engage in selective reporting. Instead, try building a model that passes the diagnostic tests.
That could work. Alternatively, consider simultaneous estimation of the single VAR equation + GARCH using e.g. the
rugarch
package in R (if you are using R). That would likely yield similar results with a bit more efficiency.