Miller and Freund actually specify that their $m$ is "the number of quantities obtained from the observed data that are needed to calculate the expected frequencies" (8th ed, p296). The difference from what you said that they say is critical, since the total count is something you calculate from the data.
Now if you look at their examples, the total count is included in $m$ quite explicitly (there's an example on the very same page they define their $m$ on).
The ones that specify $k-m-1$ define $m$ in a way that doesn't include the total count.
Which is to say, when you look properly, everyone agrees, since their definitions of $m$ differ by 1 in just the right way that they both give the same result. [I'd never read the text before I looked just now, it's easy to find what they say and verify it by reading an example to check.]
So all you need to do now is figure out how many parameters you estimate in each case and then include the 1 in the appropriate place for whichever formula you use (and that number of parameters is NOT always the same even if you test for the same distribution; testing a Poisson(10) is not the same as testing a Poisson with unspecified $\lambda$). Just count how many parameters you estimate, then add 1 when you use the total count.
So for example, if you estimate both parameters of the normal, you'd normally subtract 3 d.f. from the number of groups. If both parameters are specified, you'd only subtract 1. If you estimate one parameter, you'd subtract 2, and so on.
[Why do Miller and Freund give the formula differently from almost everyone else? I'd say it's because there are cases for goodness of fit where you don't match on the total sample size to estimate parameters, and so you don't subtract that 1. One goodness of fit example (but not a goodness-of-distributional-fit example) occurs when you test whether a standard mortality table applies to some sample, where the expected count is determined by the mortality rates and the exposure. By Miller and Freund's approach, you wouldn't be led to erroneously subtract an extra 1 from the d.f.]
However - and this is a pretty big caveat, which quite a few books get wrong - those formulas actually only apply when the parameters are estimated from the grouped data. If you estimate parameters from ungrouped data (e.g. you calculate mean and variance of a supposedly normal sample, then split it into bins for testing for normality) then you don't have a $\chi^2$ distribution at all.
On the other hand, the distribution function will lie between that of a $\chi^2_{k-1}$ and $\chi^2_{k-m-1}$ (where here $m$ doesn't include the total count), so you can at least get bounds on the p-value; alternatively you could use simulation to get a p-value. In the R package the pearson.test
function in package nortest
offers both ($k-m-1$ is the default, but you can get the other bound with a change of a default argument).
[On the other, other hand, it would seem crazy to throw away information by estimating the parameters badly just in order to get a chi-square; better to go to the small effort of simulating. On the gripping hand, nobody has any business testing goodness of fit using a chi-square test in any case so this is all moot. As I said in comments, its power is terrible. Indeed in many practical cases the bias is so bad you'd often be better off rolling a 20-sided die and rejecting the null on a 1. I wish that were a joke. Even where the power exceeds the significance level, the power may often still be so low you might as well stick with the die roll.]
I'm going to steer you away from power calculations so that you can reconsider your procedure. Your situation involves 2 independent variables--day and time--, so you need to replace Chi-square with a 2-factor, multivariate model. Which type you use depends on how you operationalize your dependent variable (DV).
Rather than response rate, which applies to groups, your DV, if measured at the individual level, would be response/nonresponse. And to test for days and times that relate to this it would be natural to use logistic regression. As part of this you could build in a term to test for a day*time interaction, i.e., for specific combinations of days and times that have especially high or low response.
If you prefer or need to measure response at the group level via response rate, then I'm guessing loglinear modeling would be your choice.
Best Answer
Does R fall into your idea of opaque software? If you are interested in this sort of calculation I would strongly recommend you use a stats package of some sort - and probably R in particular. In R, package pwr provides the function pwr.chisq.test which answers your question for you.
It isn't quite as simple as saying "for each sample size, what is the power of my test?" because as well as sample size, there is the question of the size of the effect in the underlying population you are inferring to. eg if there is a massive effect, then even a very small sample has a high power. As the effect gets smaller, you need a bigger sample size for the same power.
The documentation for pwr.chisq.test refers to Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale,NJ: Lawrence Erlbaum.
Also, a quick google search comes up with this reference (lecture 25), which shows that under the alternative hypothesis the test statistic has (asymptotically) a non-central Chi square distribution and provides a way to estimate the non-centrality parameter for a given alternative hypothesis.