[Math] Finding p value matlab

MATLABstatistics

Wondering how to solve the following problem with matlab. Tried applying ttest and ttest 2 to the data but answer is not correct. Can't figure out why. Help would be much appreciated. Thanks!

Microbial activity was measured on 32 samples, giving the following measurements (in mg per sample)

[229, 250, 202, 251, 287, 193, 294, 221, 219, 245, 291, 258, 180, 163, 162, 224, 245, 161, 226, 160, 250, 219, 237, 199, 260, 295, 196, 200, 178, 181, 236, 152]

Then a lime treatment was applied to each of the 32 samples (to balance soil pH), and microbial activity was measured again on the same 32 samples, giving the following measurements (in mg per sample):

[226, 250, 221, 267, 304, 205, 313, 236, 206, 254, 284, 259, 179, 164, 169, 215, 264, 166, 233, 170, 270, 215, 238, 186, 274, 297, 187, 198, 194, 169, 254, 137]

Use Matlab to test for evidence that the lime treatment increased microbial activity, at the 0.1 significance level.

Find a P-value for this test (to 3 decimal places)

Best Answer

I do not have access to Matlab, so I can't help you with syntax for that package.

P-value of one-sided paired t test.

However, assuming data are nearly normal, I agree with the comment by @Raskolnikov that you need a paired t test test. Specifically, a paired test of $H_0: \mu_a - \mu_b = \mu_D = 0$ against the one-sided alternative $H_a: \mu_D > 0.$ Results from R statistical software (slightly edited for relevance) are as follows:

x.a = c(226, 250, 221, 267, 304, 205, 313, 236, 206, 254, 284, 259, 
        179, 164, 169, 215, 264, 166, 233, 170, 270, 215, 238, 186, 
        274, 297, 187, 198, 194, 169, 254, 137)
x.b = c(229, 250, 202, 251, 287, 193, 294, 221, 219, 245, 291, 258, 
        180, 163, 162, 224, 245, 161, 226, 160, 250, 219, 237, 199, 
        260, 295, 196, 200, 178, 181, 236, 152)
d = x.a-x.b
t.test(d, alte="gr")

        One Sample t-test

data:  d
t = 2.2296, df = 31, p-value = 0.01658
alternative hypothesis: true mean is greater than 0
sample estimates:
mean of d 
    4.375

So the p-value is 0.01658. The test statistic is $T = \frac{\bar D - 0}{S_D/\sqrt{32}} = 2.2296,$ and the (one-sided) p-value 0.01658 is the probability to the right of $T$ under the density curve of Student's t distribution with 31 degrees of freedom. Intermediate computations:

mean(d);  sd(d)
## 4.375      # sample mean of differences
## 11.09999   # sample SD of differences

mean(d)/(sd(d)/sqrt(32))
## 2.229619   # test statistic

1 - pt(2.2296, 31)
## 0.01657741 # p-value

Because the p-value exceeds 1%, you cannot reject $H_0$ to conclude that the lime treatment increases microbial activity. (However, you could have rejected at the 5% level.)

In the figure below, the p-value is the area under the curve to the right of the vertical broken line.

Notes: (1) There is some indication that the differences may not be normal. They barely fail a Shapiro-Wilk test (p-value $\approx 0.05)$ and a normal probability plot reveals that the sample is more short-tailed than normal. Even so, for a sample size as large as $n = 32,$ the t test should be reliable.

(2) The traditional alternative to a t test, a Wilcoxon signed-rank test runs into some difficulty because of tied observations among the differences. But its one-sided p-value is approximately $0.02 > 0.01,$ so it does not reject the null hypothesis (1% level) that the median difference is greater than 0.

(3) A one-sided simulated permutation test on paired differences gives p-value about 0.17, essentially the same as the paired t test.

Related Solutions

[Math] Matlab: least square method

symbolic toolbox is not the usual way to do least square method in MATLAB, the most used function is polyﬁt and polyval, in which polyﬁt will return the coefficients set $\{a_k\}$ in the following fitting polynomial: $$ p_n(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_0 $$ simply type in

a = polyfit(x,y,1); % fitting polynomial has degree 1

and you will find that a = [-0.54668 -0.38939] which coincides with what you give.

If you use second degree a = polyfit(x,y,2);, a will be [-0.074948 0.033439 -1.234].

For the second question, to evaluate $\displaystyle\sum\limits_{i=0}^{n}[p(x_{i})-y_{i}]^2$, say you have two $(n+1)$-array xi and yi, then the most vectorized command to compute this explicitly is, supposedly you have your p give as above:

p = @(x)-0.5467*x-0.3894
S = sum((p(xi)-yi).^2,2)

noted the dot before the exponential hat, it is for the vectorized arithmetic operation in MATLAB. Or simply use the built-in Euclidean norm function norm which returns the $l^2$-norm of a sequence:

S = norm(p(xi)-yi); 
S = S^2;

will give you the same result.

[Math] Matlab, finding the variance given a probability distribution

Use $f(x)$ to obtain a large sample of randomly generated variables $x_i$ which follow this distribution. In case of the normal distribution one could use s = normrnd(mu,sigma,n,m) to create such a sample. Given the vector $s$ containing the sample, you can calculate the variance by var(s). If $f(x)$ is difficult to sample from, you could use a rejection sampler.

Example: s = normrnd(0,1,1,10000) creates a large sample (n=10000) for $X\sim N(0,1)$. This gives var(s)=1.0053.

Best Answer

Related Solutions

[Math] Matlab: least square method

[Math] Matlab, finding the variance given a probability distribution

Related Question