We'd probably need to know more about the nature of missing and the design of the study.
Generally, if the missing pattern is random, then your regression of n=4000 would still be representative. However, if the missing is associated with both outcome and exposure, then they will become confounders that are unaccounted for. In that case, even you have 4000 and only 12 independent variables, your regression results will very likely be off, over even plain misleading.
Having said that, you really need to explain why such a drastic cut. Some research designs invite a lot of missing. For instance, online questionnaires with price draw usually have this magnitude of missing. Most online respondents may just enter the survey, click through all questions without answering, and leave their e-mail to enter to lucky draw. Some other, like face-to-face interview, should never have missing this prominent.
If it's secondary data, then I'd recommend you to consult their study design documentation. Some study would only take a subset for further investigation, and may create an illusion that the others are all missing. For instance, a health study may collect all height and weight of the participants, but only random selects 10% of them for a blood test due to cost.
Studying the original questionnaire may also help. Some data may record N/A as missing. If you have accidentally chosen a question after a certain skip pattern, you may lose a lot of sample. For instance, there could be a question asking if the respondent had tried crack cocaine, and if yes, then there are a few more follow-up questions. If you have picked one of those follow up questions, big time missing can happen.
Based on what the nature is, you can address them differently in your report. But how and what to say about this problematic missing rate would depend on your study and the questions.
The problem that you're running into is the multi-collinearity in the input matrix for your regression. the matrix is 'ill-conditioned', meaning that small errors in the input lead to large errors in the ouput. The calculation for the condition number of a matrix is $\frac{\lambda_{max}}{\lambda_{min}}$ (that might only be for symmetric matricies), or the ratio of the largest to the smallest eigenvalue of the matrix.. I think that the general formula is $||A|| ||A^{-1}||$ The normal equations (the equation used to solve for the betas of the regression) are $\beta = (A^TA)^{-1}Ay$. So as you can see, if you have a matrix with a large condition number (which your program is telling you that you do), it becomes worse from the normal equations since you basically multiply the A matrix together three times. This problem (the multi-collinearity) is what's causing your $R^2$ and your betas to have "messed up" values. (remember that small errors in the inputs lead to large errors in the output). Now, what can you do about this? This large condition number comes up also on very high dimensional data. For you, it seems to be coming from the fact that your predictor variables are strongly related. What can you do about this?
(1) You can figure out which of your variables is causing the problem and remove it from the model.
(2) You can consider methods like ridge regression.
What does ridge regression do? They add a small perturbation to your matrix ($\lambda I)$ where $\lambda$ is the perturbation, and I is an identity matrix (matrix with zeroes everywhere, but ones in the diagonal). This reduces your problem with multi-collinearity, but at the expense of adding some bias to the model. I'd suggest on reading up on ridge regression or lasso before just jumping in. I've always found "The Elements of Statistical Learning" to be a good reference. It's free as a pdf online. Good luck.
Best Answer
The simplest outcome from a regression is a set of coefficients, but that is not sufficient for a true regression analysis. You say that you have used R, in R there is a built in dataset called "anscombe" (after the person who created the data). Use that dataset and fit a regression of y1 vs. x1, then do a regression of y2 vs. x2, y3 vs. x3, and y4 vs. x4.
Compare the coefficients (formulas) for the 4 regressions and think about what your conclusions are. Now plot the pairs of data and compare the plots. How does the comparison of the plots compare to the comparison of the regression models?
You could also look up Anscombe's quartet on wikipedia or google, but it is much more informative to do it yourself.
A more complete regression analysis will include not only the coefficients but also things like residual, fitted values, standard errors, confidence intervals, diagnostic plots, etc. (the complete list of everything needed in an analysis depends on the specific data, science, and questions being asked). The above can be produced with about 4 lines of code in R, I don't know how much python code it would take (but there may be prewritten python code to do the same in much fewer lines than programming straight python would).
Also, unless your predictor variables (independent variables, but I don't like the independent/dependent names) are perfectly orthogonal to each other you will get different coefficients fitting one at a time than fitting them all together, and for any dataset of real interest the other important aspects (standard errors, etc) will differ whether you do things one at a time or all together.