The simplest outcome from a regression is a set of coefficients, but that is not sufficient for a true regression analysis. You say that you have used R, in R there is a built in dataset called "anscombe" (after the person who created the data). Use that dataset and fit a regression of y1 vs. x1, then do a regression of y2 vs. x2, y3 vs. x3, and y4 vs. x4.
Compare the coefficients (formulas) for the 4 regressions and think about what your conclusions are. Now plot the pairs of data and compare the plots. How does the comparison of the plots compare to the comparison of the regression models?
You could also look up Anscombe's quartet on wikipedia or google, but it is much more informative to do it yourself.
A more complete regression analysis will include not only the coefficients but also things like residual, fitted values, standard errors, confidence intervals, diagnostic plots, etc. (the complete list of everything needed in an analysis depends on the specific data, science, and questions being asked). The above can be produced with about 4 lines of code in R, I don't know how much python code it would take (but there may be prewritten python code to do the same in much fewer lines than programming straight python would).
Also, unless your predictor variables (independent variables, but I don't like the independent/dependent names) are perfectly orthogonal to each other you will get different coefficients fitting one at a time than fitting them all together, and for any dataset of real interest the other important aspects (standard errors, etc) will differ whether you do things one at a time or all together.
We'd probably need to know more about the nature of missing and the design of the study.
Generally, if the missing pattern is random, then your regression of n=4000 would still be representative. However, if the missing is associated with both outcome and exposure, then they will become confounders that are unaccounted for. In that case, even you have 4000 and only 12 independent variables, your regression results will very likely be off, over even plain misleading.
Having said that, you really need to explain why such a drastic cut. Some research designs invite a lot of missing. For instance, online questionnaires with price draw usually have this magnitude of missing. Most online respondents may just enter the survey, click through all questions without answering, and leave their e-mail to enter to lucky draw. Some other, like face-to-face interview, should never have missing this prominent.
If it's secondary data, then I'd recommend you to consult their study design documentation. Some study would only take a subset for further investigation, and may create an illusion that the others are all missing. For instance, a health study may collect all height and weight of the participants, but only random selects 10% of them for a blood test due to cost.
Studying the original questionnaire may also help. Some data may record N/A as missing. If you have accidentally chosen a question after a certain skip pattern, you may lose a lot of sample. For instance, there could be a question asking if the respondent had tried crack cocaine, and if yes, then there are a few more follow-up questions. If you have picked one of those follow up questions, big time missing can happen.
Based on what the nature is, you can address them differently in your report. But how and what to say about this problematic missing rate would depend on your study and the questions.
Best Answer
You've used the
data-visualization
tag and that's certainly a place you can start.You might do a pairs plot (scatterplot matrix) which can sometimes give an indication of nonlinearity. However, sometimes it is hard to see clearly because of the effects of the other variables.
Another alternative is to fit linear models to both and check added-variable plots (or perhaps partial residual plots). Nonlinearity should be better visible there, but it may take adjusting for the nonlinearity in some of the other variables before some of the nonlinearity is clear.
By contrast, the one that's actually linear should not show nonlinearity in added-variable plots.