The most common convention in statistics is to use Greek letters for parameters ($\mu, \sigma$ for normal distributions, $\lambda$ for Poisson, $\beta$ when parameterizing the mean in regression and GLMS, etc). I'll assert this without any attempt to offer evidence.
You can define your notation is almost any convenient way as long as it's clear, but $\nu$, the Greek letter is probably the most traditional/widely used for the $t$ and $\chi^2$ distributions at least.
Where feasible, I think conventional notation is better, since it's likely to agree with more sources, and the Greek-letters-for-parameters is pretty well established.
[In no way should this be construed as me saying that any choice is 'right' or 'wrong'. The reason I mostly advocate following convention is because clearer/less ambiguous communication is facilitated. In a situation where there are larger benefits to choosing some other notation, convention be hanged.]
If you're using a text, I'd suggest that unless there's a good reason to do otherwise, you just use what the text uses. It will save some effort.
Presumably the intent in using $\nu$ is for the same reason we often use $\text{n}$ in our notation when dealing with sample size (presumably to stand for number), but transliterated to Greek since it's a parameter.
I expect $\text{v}$ mostly arises because some people are simply unaware that $\nu$ isn't $\text{v}$. In a few cases it could occur because people want to type $\nu$ but either can't or don't-know-how-to get it, and use $\text{v}$ as a visual approximation.
Is the first version only true when the noise on the data is Poisson, and thus $σ^2_i=E_i$.
Not quite; for example it works for the multinomial (see Pearson 1900 ); $E_i$ is no longer the standard deviation but the dependence between cells in the multinomial exactly compensates for it; see also the test of independence.
Since the examples I found are always for counts in some categories, is the chi-squared test even the right one to measure the goodness of a fit for, e.g., a voltage vs. time signal
Under some very particular assumptions perhaps, including conditionally independent Gaussian response and known $\sigma_i$. I often see it applied where it clearly doesn't apply (e.g. where there's substantial observation error in the $x$'s and its a situation where an errors-in-variables model would apply, or where the supposedly-known $\sigma$ values are clearly inconsistent with the spread of the data around the fit).
To my recollection it doesn't actually apply to nonlinear models when estimating parameters (except approximately/asymptotically).
Best Answer
Miller and Freund actually specify that their $m$ is "the number of quantities obtained from the observed data that are needed to calculate the expected frequencies" (8th ed, p296). The difference from what you said that they say is critical, since the total count is something you calculate from the data.
Now if you look at their examples, the total count is included in $m$ quite explicitly (there's an example on the very same page they define their $m$ on).
The ones that specify $k-m-1$ define $m$ in a way that doesn't include the total count.
Which is to say, when you look properly, everyone agrees, since their definitions of $m$ differ by 1 in just the right way that they both give the same result. [I'd never read the text before I looked just now, it's easy to find what they say and verify it by reading an example to check.]
So all you need to do now is figure out how many parameters you estimate in each case and then include the 1 in the appropriate place for whichever formula you use (and that number of parameters is NOT always the same even if you test for the same distribution; testing a Poisson(10) is not the same as testing a Poisson with unspecified $\lambda$). Just count how many parameters you estimate, then add 1 when you use the total count.
So for example, if you estimate both parameters of the normal, you'd normally subtract 3 d.f. from the number of groups. If both parameters are specified, you'd only subtract 1. If you estimate one parameter, you'd subtract 2, and so on.
[Why do Miller and Freund give the formula differently from almost everyone else? I'd say it's because there are cases for goodness of fit where you don't match on the total sample size to estimate parameters, and so you don't subtract that 1. One goodness of fit example (but not a goodness-of-distributional-fit example) occurs when you test whether a standard mortality table applies to some sample, where the expected count is determined by the mortality rates and the exposure. By Miller and Freund's approach, you wouldn't be led to erroneously subtract an extra 1 from the d.f.]
However - and this is a pretty big caveat, which quite a few books get wrong - those formulas actually only apply when the parameters are estimated from the grouped data. If you estimate parameters from ungrouped data (e.g. you calculate mean and variance of a supposedly normal sample, then split it into bins for testing for normality) then you don't have a $\chi^2$ distribution at all.
On the other hand, the distribution function will lie between that of a $\chi^2_{k-1}$ and $\chi^2_{k-m-1}$ (where here $m$ doesn't include the total count), so you can at least get bounds on the p-value; alternatively you could use simulation to get a p-value. In the R package the
pearson.test
function in packagenortest
offers both ($k-m-1$ is the default, but you can get the other bound with a change of a default argument).[On the other, other hand, it would seem crazy to throw away information by estimating the parameters badly just in order to get a chi-square; better to go to the small effort of simulating. On the gripping hand, nobody has any business testing goodness of fit using a chi-square test in any case so this is all moot. As I said in comments, its power is terrible. Indeed in many practical cases the bias is so bad you'd often be better off rolling a 20-sided die and rejecting the null on a 1. I wish that were a joke. Even where the power exceeds the significance level, the power may often still be so low you might as well stick with the die roll.]