For AIC/BIC selection it doesn't really matter whether you choose to count the variance parameter, as long as you are consistent across models, because inference based on information-theoretic criteria only depends on the differences between the values for different models. Thus, adding 2 (for an additional parameter) across the board doesn't change any of the delta-*IC values, which is all you are using to choose models (or do model averaging, compute model weights, etc. etc.).
(However, you do have to be careful if you're going to compare models fitted with different procedures or different software packages, because they may count parameters in different ways.)
It does matter if you are going to use AICc or some other finite-size-corrected criterion, because then the residual information in the data set is used (the denominator of the correction term is $n-k-1$). Then the question you have to ask is whether a nuisance parameter such as the residual variance, which can be computed from the residuals without modifying the estimation procedure, should be included. I wrote in this r-sig-mixed-models post that I'm not sure about the right procedure here. However, looking quickly at Hurvich and Tsai's original paper (Hurvich, Clifford M., and Chih-Ling Tsai. 1989. “Regression and Time Series Model Selection in Small Samples.” Biometrika 76 (2) (June 1): 297–307, doi:10.1093/biomet/76.2.297, http://biomet.oxfordjournals.org/content/76/2/297.abstract), it does appear that they include the variance parameter, i.e. they use $k=m+1$ for a linear model with $m$ linear coefficients ...
I would further quote Press et al. (Numerical Recipes in C) that:
We might also comment that if the difference between $N$ and $N-1$
ever matters to you, then you are probably up to no good anyway - e.g.
trying to substantiate a questionable hypothesis with marginal data.
(They are discussing the bias correction term in the sample variance calculation, but the principle applies here as well.)
I'm not sure how to exactly get AIC (i.e., which term to add) but you can use the likelihood ratio $\chi^2$ statistic minus twice the degrees of freedom as a substitute in many cases, and $\chi^{2} = - n \times \log(1 - R^{2})$ for a Gaussian model.
Best Answer
As mugen mentioned, $k$ represents the number of parameters estimated. In other words, it's the number of additional quantities you need to know in order to fully specify the model. In the simple linear regression model $$y=ax+b$$ you can estimate $a$, $b$, or both. Whichever quantities you don't estimate you must fix. There is no "ignoring" a parameter in the sense that you don't know it and don't care about it. The most common model that doesn't estimate both $a$ and $b$ is the no-intercept model, where we fix $b=0$. This will have 1 parameter. You could just as easily fix $a=2$ or $b=1$ if you have some reason to believe that it reflects reality. (Fine point: $\sigma$ is also a parameter in a simple linear regression, but since it's there in every model you can drop it without affecting comparisons of AIC.)
If your model is $$y=af(c,x)+b$$ the number of parameters depends on whether you fix any of these values, and on the form of $f$. For example, if we want to estimate $a, b, c$ and know that $f(c,x)=x^c$, then when we write out the model we have $$y=ax^c+b$$ with three unknown parameters. If, however, $f(c,x)=cx$, then we have the model $$y=acx+b$$ which really only has two parameters: $ac$ and $b$.
It is crucial that $f(c,x)$ is a family of functions indexed by $c$. If all you know is that $f(c,x)$ is continuous and it depends on $c$ and $x$, then you're out of luck because there are uncountably many continuous functions.