Title. I'd like to understand the intuition behind how regression coefficients are calculated, and why $$\frac{cov(x,y)}{var(x)}$$ gives a regression coefficient for dependent variable y and independent variable x? This feels like an elementary question but I have looked around the internet and can't anything answering this question specifically.
Regression Coefficient – Why It Is Covariance/Variance
covariancedescriptive statisticsregressionregression coefficientsvariance
Related Solutions
That's all you need to know. The rest of this answer elaborates on it, for those who have not read the link, and then it supplies a formal demonstration of the claim in that link: coloring rectangles in a scatterplot really does give the correct covariance in all cases.
The figure shows two indicator variables $X$ and $Y$: $(X,Y)=(1,0)$ with probability $p_1$, $(X,Y)=(0,1)$ with probability $p_2$, and otherwise $(X,Y)=(0,0)$. The probabilities are indicated by sets of points, where a proportion $p_1$ of them all are located close to $(1,0)$ (but spread about so you can see each of them), $p_2$ are located close to $(0,1)$, and the remaining fraction $1-p_1-p_2$ around $(0,0)$. All possible rectangles that use some two of these points have been drawn. As explained in the linked post, rectangles are positive (and drawn in red) when the points are at the upper right and lower left and otherwise they are negative (and drawn in cyan).
It is always the case that many of the rectangles cannot be seen because their width, their height, or both are zero. In the present situation, many of the rest are extremely slender because of the slight spreads of the points: they really should be invisible, too. The ones that can be seen all use one point near $(1,0)$ and one point near $(0,1)$. That makes them all negative, explaining the overall cyan cast to the picture.
Solution
A fraction $p_1$ of all rectangles have a corner at $(1,0)$. Independently of that, the proportion of those with another corner at $(0,1)$ is $p_2$. When the locations are not spread out, all such rectangles have unit width $1-0$ and unit height $1-0$ and they are negative. Therefore the covariance is
$$\text{Cov}(X,Y) = p_1 p_2 (1-0)(1-0)(-1) = -p_1p_2,$$
QED.
Mathematical Proof
The question asks for a proof.
To get started, let's establish the notation. Suppose $X$ is a discrete random variable that takes on the values $x_1,x_2,\ldots,x_k$ with probabilities $p_1,p_2,\ldots,p_k$, respectively. Let $Y_i$ be the indicator of $x_i$; that is, $Y_i = 1$ when $X = x_i$ and otherwise $Y_i=0$. Let $i\ne j$. The chance that $(Y_i,Y_j)=(1,0)$, which corresponds to $X=x_i$, is $p_i$; and the chance that $(Y_i,Y_j)=(0,1)$, which corresponds to $X=x_j$, is $p_j$. Since it is impossible for $(Y_i,Y_j)=(1,1)$, the chance that $(Y_i,Y_j)=(0,0)$ must be $1-p_i-p_j$, corresponding to $X\ne x_i$ and $X\ne x_j$. (The vector-valued random variable $(Y_1,Y_2,\ldots,Y_k)$ has a Multinomial$(1;p_1,p_2,\ldots,p_k)$ distribution.)
The question asks for the covariances of $Y_i$ and $Y_j$ for any indexes $i$ and $j$ in $1,2,\ldots, k$.
The proof uses two ideas. Their demonstrations are simple and easy.
Let $(X,Y)$ be any bivariate random variable. Suppose $(X^\prime, Y^\prime)$ is another random variable with the same distribution but is independent of $(X,Y)$. Then
$$\text{Cov}(X,Y) = \frac{1}{2}\mathbb{E}((X-X^\prime)(Y-Y^\prime)).$$
To see why this is so, note that the right hand side remains unchanged when $X$ and $X^\prime$ are shifted by the same amount and also when $Y$ and $Y^\prime$ are shifted by some common amount. We may therefore apply suitable shifts to make the expectations all zero. In this situation
$$\eqalign{\text{Cov}(X,Y) &= \mathbb{E}(XY)\\ & = \frac{1}{2}\mathbb{E}(XY + X^\prime Y^\prime)\\ & = \frac{1}{2}\mathbb{E}(XY + X^\prime Y^\prime) + \frac{1}{2}\mathbb{E}(X)\mathbb{E}(Y^\prime) + \frac{1}{2}\mathbb{E}(X^\prime)\mathbb{E}(Y) \\ & = \frac{1}{2}\mathbb{E}(XY + X^\prime Y^\prime) + \frac{1}{2}\mathbb{E}(XY^\prime) + \frac{1}{2}\mathbb{E}(X^\prime Y) \\ &= \frac{1}{2}\mathbb{E}((X-X^\prime)(Y-Y^\prime)). }$$
Those extra terms like $\mathbb{E}(X)\mathbb{E}(Y^\prime)$ could be freely added in the middle step because they are all zero. The equalities of the form $\mathbb{E}(X)\mathbb{E}(Y^\prime) = \mathbb{E}(X Y^\prime)$ in the following step result from the independence of $X$ and $Y^\prime$ and of $X^\prime$ and $Y$.
Where did that factor of $1/2$ go in the crayon calculation? When $(X,Y)$ has a discrete distribution with values $(x_i,y_i)$ and associated probabilities $\pi_{i}$,
$$\eqalign{ \frac{1}{2}\mathbb{E}((X-X^\prime)\mathbb{E}(Y-Y^\prime)) &=\sum_{i,j=1}^k (x_i-x_j^\prime)(y_i-y_j^\prime)\pi_{i}\pi_{j} \\ &= \frac{1}{2}\left(\sum_{i\gt j} + \sum_{i \lt j} + \sum_{i=j}\right)(x_i-x_j^\prime)(y_i-y_j^\prime)\pi_{i}\pi_{j} \\ &= \frac{1}{2}\left(2\sum_{i\gt j} (x_i-x_j^\prime)(y_i-y_j^\prime)\pi_{i}\pi_{j}\right) + \sum_{i=j} 0 \\ &= \sum_{i \gt j} (x_i-x_j^\prime)(y_i-y_j^\prime)\pi_{i}\pi_{j}. }$$
In words: the expectation averages over ordered pairs of indices, causing each non-empty rectangle to be counted twice. That's why the factor of $1/2$ is needed in the formula but does not need to be used in the crayon calculation, which counts each distinct rectangle just once.
Applying these two ideas to the bivariate $(Y_i,Y_j)$ in the question, which takes on only four possible values $(0,0),(1,0),(0,1),(1,1)$ with probabilities $1-p_ip_j, p_i, p_j$, and $0$, gives a sum that has only one nonzero term arising from $(1,0)$ and $(0,1)$ equal to
$$\text{Cov}(Y_i,Y_j) = p_i p_j (1-0)(0-1) = -p_i p_j,$$
QED.
"Confers an increased risk of ...." or "Increases the risk of ...." (If you are claiming a causal relationship.)
"Is associated with an increased risk of ...." (If you are claiming merely a statistical relationship.)
Avoid "implies an increased risk of ....", because "implies" can mean both "suggests" and "necessitates".
Best Answer
A linear regression coefficient tells us: If predictor variable $x$ increases by 1, what is the expected increase in outcome variable $y$?
The answer to this question depends in large part on the scales on which $x$ and $y$ are measured. E.g., if $x$ is a measure of length, imagine measuring in millimeters or centimeters; the variance of measurements in millimeters will be $10^2$ times the variance of the same measurements in centimeters; the covariance will be multiplied by 10. Note, $cov(x,y)$ is determined by three things:
Because of 2) and 3), I would call the covariance an unstandardized measure of association. Its value is difficult to interpret, because what would be a large and what would be a small value depends on the scales of $x$ and $y$. The correlation coefficient, however, gives us a standardized measure of association: It is 'corrected' for the scales on which $x$ and $y$ are measured:
$$cor(x,y)=\frac{cov(x,y)}{\sqrt{var(x) \cdot var(y)}}$$
The correlation coefficient tells us: If $x$ increases by $\sqrt{var(x)}$, how many $\sqrt{var(y)}$s will outcome $y$ increase? Thus, with a correlation coefficient of 1, an increase of 1 SD in $x$ is associated with an increase of 1 SD in $y$.
Now, the regression coefficient quantifies the expected increase in $y$, when $x$ increases by 1. We thus need to 'correct' the covariance between $x$ and $y$ for the scale of $x$. We can do that by simply dividing:
$$\frac{cov(x,y)}{var(x)}$$
Note that if we would 'reverse' the problem, and ask the question: If $y$ increases by 1, what is the expected increase in $x$? We can compute the answer as follows:
$$\frac{cov(x,y)}{var(y)}$$