All that matters is the difference between two AIC (or, better, AICc) values, representing the fit to two models. The actual value of the AIC (or AICc), and whether it is positive or negative, means nothing. If you simply changed the units the data are expressed in, the AIC (and AICc) would change dramatically. But the difference between the AIC of the two alternative models would not change at all.
Bottom line: Ignore the actual value of AIC (or AICc) and whether it is positive or negative. Ignore also the ratio of two AIC (or AICc) values. Pay attention only to the difference.
The form of $AICc$ of
$$
AICc = AIC + \frac{2k(k+1)}{n-k-1}
$$
was proposed by
Hurvich, C. M.; Tsai, C.-L. (1989), "Regression and time series model selection in small samples", Biometrika 76: 297–307
specifically for a linear regression model with normally distributed errors. For different models, a different correction will need to be derived.
These derivations are often difficult and the resulting correction may be challenging to calculate. For instance
Hurvich, Clifford M., Jeffrey S. Simonoff, and Chih‐Ling Tsai. "Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60, no. 2 (1998): 271-293.
propose a correction to be used in the case of nonparametric regression models which takes the form
$$
AICc = -2ln(L) + n^2\int_0^1(1-t)^{r/2-2}\prod_{j=1}^{r}(1-t+2d_j)^{-1/2}dt+n\int_0^{\infty}\sum_{i=1}^n\frac{c_{ii}}{1+2d_it}\prod_{i=1}^n(1+2d_it)^{-1/2}dt
$$
I will not go into the details here as they are largely irrelevant but I wanted to illustrate the complexity involved. Actual calculation of this value involves eigen-analysis and numerical integration.
For reasons like this, many authors such as
Burnham, K. P.; Anderson, D. R. (2002), Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.), Springer-Verlag, ISBN 0-387-95364-7
suggest to use the form
$$
AICc = AIC + \frac{2k(k+1)}{n-k-1}
$$
regardless of model. Even Hurvich et al. (1998) despite deriving their complicated $AICc$ for nonparametric regression ultimately conclude that you might as well use the much simpler version for linear regression.
Generally this advice seems to work well, giving practically useful results. However there are circumstances, such as the one you've highlighted where it doesn't work. You would need to find an appropriate $AICc$ for k-means, or derive one yourself, or simply use $AIC$ which is more generally applicable.
Best Answer
I believe I have found my own answer. It sounds like using AIC to select from a large set of conceivable variables can result in models that are "excessively" tailored to the data at hand, ie. data dredging. However, if selecting from a limited set of a priori models, then it should be fine.
Reference:
Question 4 from Joseph E. Cavanaugh's slides at: http://myweb.uiowa.edu/cavaaugh/ms_lec_14_ho.pdf