I understand the Park test for heteroskedasticity has three different forms. The best known one is in a log form: LN(Residual^2) = intercept + slope (LN(X)). The second one is in a linear form: Residual^2 = intercept + slope (X). In both cases if the regression coefficient of X (the independent variable you are testing for heteroskedasticity) then you have to reject the null hypothesis that residuals are homoskedastic relative to the levels of the tested independent variable X. Nevertheless, do you know how well established is the second form, the linear one? And, also what is the third form? For model variables I need to test, it is key that I can use the linear form because many of those variables are percent changes that can't be logged.
Solved – What are the three forms of the Park test for heteroskedasticity
diagnosticheteroscedasticityregression
Related Solutions
Actually, I'd say just the opposite. Multicolinearity is often scoffed at as a concern. The only time this is a real issue is when one variable can be written as an exact linear function of others in the model (a male dummy variable would be exactly equal to a constant/intercept term minus a female dummy variable; hence, you can't have all three in your model). A prime example is Goldberger's comparison to "micronumerousity."
Perfect multicolinearity means that your model cannot be estimated; (not perfect) multicolinearity often leads to large standard errors, but no bias or real problems; heteroskedasticity means that your standard errors are incorrect and your estimates are inefficient.
First, I would create a model that yields the parameter estimates as I want to interpret them (level change, percent change, etc.) by using logs as appropriate. Then, I would test for heteroskedasticity. The most accepted option is to simply use robust standard errors to give you correct standard errors, but for inefficient parameter estimates. Alternatively, you can use weighted least squares to get efficient estimates, but this has become less common unless you know the relationship between the variances of your observations (they each depend upon the size of the observation---like population of a country). Indeed, in cross section econometrics using a data set of any real size, robust standard errors have become required irrespective of the outcome of a BP test; they are applied almost automatically.
There isn't a good test for endogeneity. You're real problem is that the regressor is correlated with the error; OLS will force the regressor to be uncorrelated with the residual. So you won't find any correlation there. Endogeneity is what makes econometrics hard and is a whole topic unto itself.
Park's original one-page paper (here) was more concerned with dealing with heteroskedasticity, rather than test for its existence. So given heteroskedasticity, Park assumes a specific form of it, namely a log-linear relationship between the variance of the error term and one regressor
$$\sigma^2_{\epsilon _i} = \sigma^2X_i^{\gamma}e^{u_i}$$ $$\Rightarrow \ln \sigma^2_{\epsilon _i} = \ln\sigma^2 + \gamma \ln X_i + u_i$$
To estimate this relationship, one needs to obtain a data series for $\ln \sigma^2_{\epsilon _i}$. Park suggested using the residuals from the original regression as a substitute, i.e.
$$\ln \sigma^2_{\epsilon _i} \approx \ln (\hat \epsilon^2_{1i})$$
assume $u_i$ is "nicely behaved" and estimate the regression
$$\ln (\hat \epsilon^2_{1i})= a + \gamma \ln X_i + u_i$$
Then, in order to deal with heteroskedasticity, one would transform the original equation by dividing by $X^{\hat \gamma/2}$
"Park's test" is to view instead the auxiliary regression as a test for heteroskedasticity, where if $\hat \gamma$ appears statistically significant, the null hypothesis of no-heteroskedasticity is rejected. In any case, I don't see where the second regression you mention in the question comes into play.
Best Answer
I am less familiar with the Park test. The Wikipedia page only lists what you call the first form. What you are calling the second form is identical to the Breusch-Pagan test. It is very well established, for what that's worth. Regarding the distinction between using logged or non-logged predictors and response variables, it may help you to read this excellent CV thread: Interpretation of log transformed predictor. In general, the two-stage approach to modeling (i.e., test for assumptions, and fit standard model if non-significant or robust model if significant) is not recommended (see this excellent CV thread: A principled method for choosing between t test or non-parametric e.g. Wilcoxon in small samples). If you are worried about the possibility of heteroscedasticity, you would be better off just using robust methods, such as the Huber-White heteroscedasticity consistent 'sandwich' standard errors, by default. For some examples of various strategies that can be used with heteroscedastistic data, it may help to read my answer here: Alternatives to one-way ANOVA for heteroscedastic data.