The VIF has been generalized to deal with logistic regression (assuming you mean a model with a binary dependent variable). In R, you can do this using the vif
function in the car
package.
As @RichardHardy has said, it is not a test though. At the end you will get some GVIFs and still need to make some subjective decisions. The thing to keep in mind is that if you have high VIFs, it means that your standard errors will be inflated from some of your estimates, so results that could be meaningful may not be detected as being significant. The books and writings by John Fox, who also co-wrote the car package, are a great resource for understanding multicollinearity.
The capacity to interpret regression relationships as causal generally depends on experimental protocols rather than the assumed structure of the statistical model. Regression models allow us to relate the explanatory variables statistically to the response variable, where this relationship is made conditional on all the explanatory variables in the model. As a default position, that is still just a predictive relationship, and should not be interpreted causally. That is the case in standard linear regression using OLS estimation, and it is also true in logistic regression.
Suppose we want to interpret a regression relationship causally ---e.g., we have an explanatory variable $x_k$ and we want to interpret its regression relationship with the response variable $Y$ as a causal relationship (the former causing the latter). The thing we are scared of here is the possibility that the predictive relationship might actually be due to a relationship with some confounding factor, which is an additional variable outside the regression that is statistically related to $x_k$ and is the real cause of $Y$. If such a confounding factor exists, it will induce a statistical relationship between these variables that we will see in our regression. (The other mistake you can make is to condition on a mediator variable, which also leads to an incorrect causal inference.)
So, in order to interpret regression relationships causally, we want to be confident that what we are seeing is not the result of confounding factors outside our analysis. The best way to ensure this is to use controlled experimentation to set $x_k$ via randomisation/blinding, thereby severing any statistical link between this explanatory variable and any would-be confounding factor. In the absence of this, the next best thing is to use uncontrolled analysis, but try to bring in as many possible confounding factors as we can, to filter them out in the regression. (No guarantees that we have found them all!) There are also other methods, such as using instrumental variables, but these generally hinge on strong assumptions about the nature of those variables.
None of the assumptions you mention are necessary or sufficient to infer causality. Those are just model assumptions for the logistic regression, and if they do not hold you can vary your model accordingly. The main assumption you need for causal inference is to assume that confounding factors are absent. That can be done by using a randomisation/blinding protocol in your experiment, or it can be left as a (hope-and-pray) assumption.
Best Answer
Endoegeneity:
I will try to use an example here. Let’s say there is a group of students planning to sit for the GRE
• Some of them decided to register for online training courses prior to the GRE exam.
• Naturally, you would want to know whether online training courses help to obtain good scores
• To answer this question, it is tempting to use GRE scores as a dependent variable and use a dummy variable to indicate whether OR not someone took online training courses prior to the exam ( you will have other independent variable in the regression)
o Scores = b0 + b1* Online Course Indicator + b2*Age + b3* math_major + ...+ error
• The twist here is, what if weaker students chose to go through the online training program (on average). In this case you might see a negative coefficient on the dummy variable- Online Course Indicator. Because the weaker ones will have lower scores on average than the smarter ones who did not take online courses
• This coefficient might be very misleading, because it would indicate that the online course are ineffective and yields lower scores on average, which is not true.
• The true story here is that the ‘Online Course Indictor’ also measures the intelligence level of students to some degree. Why? Because the weaker ones are more likely to take online courses prior to the exam.
• Now think about the error. What does the error measure? It also measures unobservable things such as intelligence, motivation, etc….So you error is correlated with one of the independent variables, specifically “ Online Course Indicator’…THIS IS CALLED ENDOGENEITY
Multicollinearity:
I am going to use the same example. Let say you decided include the family income and a neighborhood dummy to indicate whether OR not the student is from a wealthy neighborhood.
o Scores = b0 + b1* Online Course Indicator + b2*Age + b3* family income + Dummy for Wealthy Neighborhood...+ error Notice that the family income and the Dummy for Wealthy Neighborhood are correlated. That is students from wealthy neighborhood will have higher income on average. So both variables are measuring the same thing to some extent. This we call Multucollinearity.