Common practise is to compare p-value with three levels - 0.05, 0.01 and 0.001. Since your p-value is less than each of them, you have to choose the smallest one, so you should conclude that differences are significant and p<0.001. Roughly speaking: The smaller the p-value, the more significant differences are.
Since we do not know distribution of your data, we do not also know which test should you use. But you have quite large sample, so there is high chance that parametric test can be appropriate (t-test for paired data).
Yes, this is possible and even fairly easy, but additional information is required. Specifically, we have to make an assumption about what the correlation between the observations from each pair are.
The effect size as a difference in standard deviation units is usually referred to as $d$. We can apply a correction factor to $d$ to incorporate the information about the aforementioned correlation, and then we can use our standard power formulae with this corrected $d$ (making sure to also mind the change in degrees of freedom associated with moving to the paired design) to compute power. The corrected $d$ is
$$
d_o = \frac{d}{\sqrt{1-r}},
$$
where $r$ is the correlation. I have called this $d_o$ because this is sometimes referred to as the "operative effect size."
Here is a little R
routine that computes a table of minimum number of PAIRS as a function of the assumed correlation and the desired power level, with $d=2$ assumed.
library(pwr) # package for pwr.t.test() function
# may need to install first with install.packages()
# define a function to get the minimum number of pairs
# for a given correlation and desired power level
getN <- function(r,p){
unlist(mapply(pwr.t.test, d=2/sqrt(1-r), power=p,
MoreArgs=list(n=NULL, sig.level=.05, type="paired"))["n",])
}
# apply this function to all combinations of the parameters below
tab <- outer(seq(0,.95,.05), c(.7,.8,.9,.95,.99,.999), "getN")
dimnames(tab) <- list("Correlation"=seq(0,.95,.05),
"DesiredPower"=c(.7,.8,.9,.95,.99,.999))
tab
Which returns the following:
DesiredPower
Correlation 0.7 0.8 0.9 0.95 0.99 0.999
0 3.767546 4.220731 4.912411 5.544223 6.888820 8.656788
0.05 3.691858 4.126240 4.787326 5.389850 6.669683 8.350091
0.1 3.615930 4.031562 4.662220 5.235637 6.451021 8.044096
0.15 3.539645 3.936653 4.537050 5.081483 6.232774 7.738792
0.2 3.462940 3.841433 4.411750 4.927417 6.014903 7.434270
0.25 3.385708 3.745774 4.286234 4.773338 5.797404 7.130529
0.3 3.307922 3.649640 4.160447 4.619143 5.580267 6.827580
0.35 3.229382 3.552889 4.034209 4.464751 5.363362 6.525430
0.4 3.149970 3.455310 3.907393 4.309986 5.146613 6.224026
0.45 3.069435 3.356743 3.779777 4.154653 4.929824 5.923282
0.5 2.987581 3.256903 3.651065 3.998456 4.712773 5.623032
0.55 2.904079 3.155423 3.520841 3.841020 4.495111 5.323066
0.6 2.818472 3.051834 3.388672 3.681805 4.276260 5.022875
0.65 2.730145 2.945449 3.253781 3.520048 4.055501 4.721751
0.7 2.638237 2.835369 3.115118 3.354639 3.831565 4.418560
0.75 2.541442 2.720152 2.971074 3.183823 3.602697 4.111397
0.8 2.437713 2.597460 2.819127 3.004879 3.365682 3.796879
0.85 2.323340 2.463226 2.654597 2.812710 3.114890 3.468745
0.9 2.190677 2.309002 2.467897 2.596901 2.838233 3.113596
0.95 2.018024 2.110699 2.231866 2.327720 2.501567 2.692358
Note that $d=2$ is considered in many fields quite a large effect size, so the resulting minimum numbers of pairs are all quite low.
Best Answer
the function
ttost
is not a t-test and therefore is not suitable for your purposes. The TTOST is a test of non-equivalence. It employes two one-sided t-tests in order to verify if both samples are equivalent or not. Please, have a look at the function documentation.There exists the ttest_mean function on the
statsmodels
package. However, it does not indicate if the test is conducted with paired samples or not. Thus, I recommend you to use the scipy.stats t-test.And about your last question:
The paired t-test reduces intersubject variability. Thus, it is theoretically more powerful than the unpaired t-test.