There is, as far as I know, no taxonomy of variables that captures all the contrasts that might be important for some theoretical or practical purpose, even for statistics alone. If such a taxonomy existed, it would probably be too complicated to be widely acceptable.
It is best to focus on examples rather than give numerous definitions. Number of days is a counted variable. It qualifies as discrete rather than continuous, and it is possible that the discreteness is important, particularly if most values are small. Some statistical people might want to insist that only models that apply to discrete variables should be used for such a variable.
At the same time, it is often the case that models and methods treat such a variable as approximately continuous. Population size is a yet more obvious example. Human populations can be in billions and many procedures effectively treat such variables as continuous, regardless of the familiar fact that people are individuals.
In contrast, a variable such as temperature is in principle continuous, but as a matter of convention temperatures may only be reported to the nearest degree or tenth of a degree, so the number of possible values may be rather small in practice. This does not usually worry anyone; it would certainly be perverse to call such a variable categorical. There are some contexts in which the discreteness of reported temperature is important: in reading mercury thermometers by eye and guessing at the last digit, people show idiosyncratic preferences for or against certain digits of the ten possibilities 0 to 9.
Also, what do we do with categories? Answer: we count them. We count males, females; unemployed, employed, retired, students; whatever. So, often we are modelling category counts.
In short, discrete counts are a common kind of variable, as well as continuous and categorical variables.
The problem that you're running into is the multi-collinearity in the input matrix for your regression. the matrix is 'ill-conditioned', meaning that small errors in the input lead to large errors in the ouput. The calculation for the condition number of a matrix is $\frac{\lambda_{max}}{\lambda_{min}}$ (that might only be for symmetric matricies), or the ratio of the largest to the smallest eigenvalue of the matrix.. I think that the general formula is $||A|| ||A^{-1}||$ The normal equations (the equation used to solve for the betas of the regression) are $\beta = (A^TA)^{-1}Ay$. So as you can see, if you have a matrix with a large condition number (which your program is telling you that you do), it becomes worse from the normal equations since you basically multiply the A matrix together three times. This problem (the multi-collinearity) is what's causing your $R^2$ and your betas to have "messed up" values. (remember that small errors in the inputs lead to large errors in the output). Now, what can you do about this? This large condition number comes up also on very high dimensional data. For you, it seems to be coming from the fact that your predictor variables are strongly related. What can you do about this?
(1) You can figure out which of your variables is causing the problem and remove it from the model.
(2) You can consider methods like ridge regression.
What does ridge regression do? They add a small perturbation to your matrix ($\lambda I)$ where $\lambda$ is the perturbation, and I is an identity matrix (matrix with zeroes everywhere, but ones in the diagonal). This reduces your problem with multi-collinearity, but at the expense of adding some bias to the model. I'd suggest on reading up on ridge regression or lasso before just jumping in. I've always found "The Elements of Statistical Learning" to be a good reference. It's free as a pdf online. Good luck.
Best Answer
The word numerical means 'consisting of numbers' ('expressed in or counted by numbers' in one dictionary). Counts are clearly numerical. Indeed they have a meaningful zero and '6' is literally twice as much as '3' and three times as much as '2' ... and so forth (3 bricks + 3 bricks = 6 bricks, etc,.. so 6 bricks is twice as many bricks as 3 bricks), so if you're considering Stevens' typology, arguably ratio scale to boot.
What matters more is how you see it coming into your model.
If you're reading a book that's telling you how to treat variables based only on the division 'categorical' or not, you may sometimes be led into poor choices of analysis.
You can bin variables. You can forget the bin boundaries and make them into (ordered) categories. You can even ignore the ordering. Every one of those steps result in loss of information, and in many cases the introduction of bias, so the larger question is not whether you can, but whether you should. If you step back from "must use clustering" for a moment, what are you trying to achieve with it?