Regression based on principal components analysis (PCA) of the independent variables is certainly a way to approach this problem; see this Cross Validated page for one extensive discussion of pros and cons, with links to further related topics. I don't see the point of the regression you propose after choosing the largest components. The "reconstructed" independent variables might suffer from being too highly dependent on the particular sample on which you based the model, and stepwise selection is generally not a good idea. Cross-validation would be a better way to choose the number of components to retain, finding the number of components that minimizes cross-validation error.
In your situation, with only 5 predictors you might be just as well served by a standard linear model. Unless you have extremely high correlations among some of your variables, you are unlikely to have the numerical instability issues that can arise in extreme cases. (And if you do have two very highly correlated predictors, you should consider using your knowledge of the subject matter rather than an automated approach to choose one.) Paying attention to model diagnostics will help determine whether the linear model is reasonable.
A standard regression model provides easier-to-interpret coefficients and might be easier to explain to others than PCA. For predictions from a linear model you should consider including all 5 independent variables (even those that aren't "statistically significant"), both because of the limitations of stepwise selection and because the relations of some predictors to the dependent variable will differ if other predictors are removed.
If you have very high co-linearity in a standard linear regression then it should show up in high errors associated with the corresponding coefficients, and you might consider approaches noted here like ridge regression to get useful information from all your predictors without overfitting. Ridge regression can be considered as a continuous version of the PCA-regression approach, where principal components are differentially weighted instead of being completely either in or out of the final model; see section 3.5 of Elements of Statistical Learning.
For your second and third questions:
The first page I linked above does a pretty good job of addressing your second question. Yes, choosing a limited number of principal components can help in reducing the problems associated with co-linearity, as the co-linear variables will tend to enter the same principal components together. Two warnings: the predictors should be standardized so that differences in scales don't drive the construction of the principal components, and there's no assurance that the components that capture the greatest variation in the predictors will be those most closely related to the dependent variable.
With respect to your third question, a stepwise approach is inappropriate, as you recognize. I don't see a reason why you couldn't include interaction terms among your selected principal components in a regression, but they would be extremely hard to interpret. That's another reason why I would lean here toward working with the original independent variables rather than with their transformations into principal components.
You seem very interested in using PCA for this predictive model, but remember that it's easy to get fixated on a particular approach. You are in a very good position to compare several approaches, combined with appropriate cross-validation or bootstrapping techniques, to see which works best for your particular needs. If that ends up being PCA that's good, but don't dismiss the other possibilities out of hand.
Variance Inflation Factors are defined on the level of regressors. A categorical factor with $k$ levels will (usually) be dummy-coded into $k-1$ separate boolean dummies, so you might, if at all, get $k-1$ VIFs.
However, collinearity between categorical data is much less well understood than collinearity between numerical regressors. See also here: Collinearity between categorical variables So I wouldn't be surprised if your software package made a conscious decision not to output VIFs for categorical data.
Best Answer
The VIF for a given predictor variable tells you to what degree that variable is correlated with a linear combination of all the other predictors. This explains VIF pretty well.
So, you don't know for sure that Q5, Q6, and Q7 are the only predictors causing multicollinearity in your model, but removing the predictors with a high VIF one at a time and re-running the model can help you figure out which predictors would be most beneficial to remove.
If you have some understanding of what these variables represent that can help you decide which ones to keep in your model.