I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.
One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.
The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.
Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.
You don't compare the individual points to conclude a treatment effect. You see whether the lines for the treatment and control are different.
In some circumstances, the fitted lines might be parallel, and just the difference in intercept is of interest. In others, both the intercept and slope might differ, and any difference would be of interest.
Testing point vs line in ordinary regression (not errors-in-variables, which is more complicated):
It's not correct to check if data values for another are in the confidence interval because the data values themselves have noise.
Call the first sample $(\underline{x}_1,\underline{y}_1)$, and the second one $(\underline{x}_2,\underline{y}_2)$. Your model for the first sample is $y_1(i) = \alpha_1 + \beta_1 x_{1,i} + \varepsilon_i$, with the usual iid $N(0,\sigma^2)$ assumption on the errors.
You want to see if a particular point $(x_{2,j},y_{2,j})$ is consistent with the first sample. Equivalently, to check whether an interval for $y_{2,j} - \left(\alpha_1 + \beta_1 x_{2,j}\right)$ includes 0 (notice the points are second-sample, the line is first-sample).
The usual way to obtain such CI would to construct a pivotal quantity, though one could simulate or boostrap as well.
However, since in this illustration we're doing it for a single point, under normal assumptions and with ordinary regression conditions, we can save some effort: this is a solved problem. It corresponds to (assuming sample 1 and sample 2 have a common population variance) checking whether one of the sample 2 observations lies within a prediction interval based on sample 1, rather than a confidence interval.
Best Answer
This type of situation can be handled by a standard F-test for nested models. Since you want to test both of the parameters against a null model with fixed parameters, your hypotheses are:
$$H_0: \boldsymbol{\beta} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \quad \quad \quad H_A: \boldsymbol{\beta} \neq \begin{bmatrix} 0 \\ 1 \end{bmatrix} .$$
The F-test involves fitting both models and comparing their residual sum-of-squares, which are:
$$SSE_0 = \sum_{i=1}^n (y_i-x_i)^2 \quad \quad \quad SSE_A = \sum_{i=1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2$$
The test statistic is:
$$F \equiv F(\mathbf{y}, \mathbf{x}) = \frac{n-2}{2} \cdot \frac{SSE_0 - SSE_A}{SSE_A}.$$
The corresponding p-value is:
$$p \equiv p(\mathbf{y}, \mathbf{x}) = \int \limits_{F(\mathbf{y}, \mathbf{x}) }^\infty \text{F-Dist}(r | 2, n-2) \ dr.$$
Implementation in R: Suppose your data is in a data-frame called
DATA
with variables calledy
andx
. The F-test can be performed manually with the following code. In the simulated mock data I have used, you can see that the estimated coefficients are close to the ones in the null hypothesis, and the p-value of the test shows no significant evidence to falsify the null hypothesis that the true regression function is the identity function.The
summary
output andplot
for this data look like this: