For the standard linear regression model the absolute value of the coefficient estimates and the p-value are not related in the way you describe. It is very possible to have absolutely large coefficients which are insignificant and absolutely small coefficients which are very significant. What your missing in your interpretation is the effect of the coefficient estimate standard errors.
The coefficients R reports (lets call them $b_1,b_2,b_3,...,b_k$) are the best linear unbiased estimators of the true parameters $\beta_1,\beta_2,\beta_3,...,\beta_k$ in that they minimize the sum of squared error or formally:
$$
\{b_1,b_2,...,b_k\} = {\textrm{argmin} \atop \alpha}\left\{ \sum_{i=1}^{n}(y_i-\alpha_1x_{i,1}-. . .-\alpha_kx_{i,k})^2]\right\}
$$
The p-value for the $i^{th}$ coefficient which R is reporting is the result of the following hypothesis test:
$H_0: \beta_i = 0$
$H_A: \beta_i \neq 0$
Assuming the regression is properly specified, it can be shown, with the central limit theorem, that each $b_i$ is a normally distributed random variable with mean $\beta_i$ and some standard deviation (also called standard error) $\sigma_i$. This is because the $b$'s are estimated with a random sample so they too are random variables (roughly speaking).
What determines the $i^{th}$ p-value is where 0 "lands" in the normal distribution $N(\beta_i,\sigma_i^2)$ (technically the test is done using a t-distribution...but the difference is not so important for addressing your question). If zero is in the tails of $N(\beta_i,\sigma_i^2)$ the p-value is low, if it's more in the middle the p-value is high.
So given two estimates $b_i$ and $b_j$ where $b_i$ is "super far away" from zero and $b_j$ is "super close to" zero, the p-value of $b_i$ would be lower than $b_j$ assuming $\sigma_i$=$\sigma_j$. The part you are missing in your interpretation is that $\sigma_i$ and $\sigma_j$ can be very different. Essentially if $b_i$ is "huge" but $\sigma_i$ is also "huge" you see that you can get a high p-value. Conversely for "small" $b_i$ and "super small" $\sigma_i$, you see you can get a small p-value.
I hope that helps!
If we have one independent variable, then the given regression coefficient $\beta$ = Pearson's $r$.
This is not correct. A simple argument is that regression coefficients are not bounded between [-1,1] (e.g. $\beta=15$ is nothing extraordinary) while the correlation coefficient $r$ is.
If we have multiple independent variables, how can calculate Pearson's $r$ for each variable if only $\beta$ values are given for each independent variable.
You cannot. $\beta$s do not imply $r$s, i.e. for a fixed set of $\beta$s you may have different $r$s.
Best Answer
The coefficient of determination is defined for the model as whole and not for individual variables. However, there is a technique called ANOVA which can roughly be thought of as breaking $R^2$ into contributions from each variable.
Recall that the coefficient of determination is defined in terms of the sums of squares of residuals:
$$ \begin{align} R^2 & = 1 - {SS_{\rm res}\over SS_{\rm tot}} \\ SS_\text{tot} & =\sum_{i=1}^n (\bar{y} - y_i)^2 \\ SS_\text{res} & =\sum_{i=1}^n (\hat{y}_i-y_i)^2 \\ \end{align}$$
Where $\hat{y}$ is the prediction vector of the model. Since we can't make a prediction $\hat{y}$ without considering all of the variables in the model.
But look at the equation for $SS_\text{tot}$ more again. This has the exact same form as the $SS_res$ if it were a trivial model with only an intercept term; such a model would predict $\hat{y}_i = \bar{y}$ for all $i$. This suggests that we are not comparing one model to some platonic ideal, but actually comparing two different models. This insight can be generalized into a chain of models:
$$ \frac{SS_1}{SS_{\text{tot}}} + \frac{SS_2}{SS_{\text{tot}}} + ... + \frac{SS_k}{SS_{\text{tot}}}= 1 $$
If we consider a chain of models, starting from the intercept only model and adding one variable at a time, then the quantity $\frac{SS_j - SS_{j-1}}{SS_\text{tot}}$ can be intrepretted as the "amount of variance explained by the $j$-th variable. As a concrete example, here is the output of the
anova()
function on the built-in airquality dataset:This is called the "sequential" analysis of variance. The
Sum Sq
column sums to the total sums of squares of the entire dataset, so we can see thatWind
explains twice as much variance ofTemp
. This interpretation is subject to many caveats: it is sensitive to the order in which variables are added, and the F-scores and associated P values on the left are only meaningful for a purely linear model, etc. Nevertheless, if we take thatSum Sq
column and divide by total sums of squares:We get a table where ever line item is roughly analogous to the quote-unquote "$R^2$" for each variable (plus one line item for the unexplained residual), although that terminology is never used, as far as I know. People talk about the proportion of variance explained instead.
Here are some additional resources if you want to read further: