Maybe too late but I add my answer anyway...
It depends on what you intend to do with your data: If you are interested in showing that scores differ when considering different group of participants (gender, country, etc.), you may treat your scores as numeric values, provided they fulfill usual assumptions about variance (or shape) and sample size. If you are rather interested in highlighting how response patterns vary across subgroups, then you should consider item scores as discrete choice among a set of answer options and look for log-linear modeling, ordinal logistic regression, item-response models or any other statistical model that allows to cope with polytomous items.
As a rule of thumb, one generally considers that having 11 distinct points on a scale is sufficient to approximate an interval scale (for interpretation purpose, see @xmjx's comment)). Likert items may be regarded as true ordinal scale, but they are often used as numeric and we can compute their mean or SD. This is often done in attitude surveys, although it is wise to report both mean/SD and % of response in, e.g. the two highest categories.
When using summated scale scores (i.e., we add up score on each item to compute a "total score"), usual statistics may be applied, but you have to keep in mind that you are now working with a latent variable so the underlying construct should make sense! In psychometrics, we generally check that (1) unidimensionnality of the scale holds, (2) scale reliability is sufficient. When comparing two such scale scores (for two different instruments), we might even consider using attenuated correlation measures instead of classical Pearson correlation coefficient.
Classical textbooks include:
1. Nunnally, J.C. and Bernstein, I.H. (1994). Psychometric Theory (3rd ed.). McGraw-Hill Series in Psychology.
2. Streiner, D.L. and Norman, G.R. (2008). Health Measurement Scales. A practical guide to their development and use (4th ed.). Oxford.
3. Rao, C.R. and Sinharay, S., Eds. (2007). Handbook of Statistics, Vol. 26: Psychometrics. Elsevier Science B.V.
4. Dunn, G. (2000). Statistics in Psychiatry. Hodder Arnold.
You may also have a look at Applications of latent trait and latent class models in the social sciences, from Rost & Langeheine, and W. Revelle's website on personality research.
When validating a psychometric scale, it is important to look at so-called ceiling/floor effects (large asymmetry resulting from participants scoring at the lowest/highest response category), which may seriously impact on any statistics computed when treating them as numeric variable (e.g., country aggregation, t-test). This raises specific issues in cross-cultural studies since it is known that overall response distribution in attitude or health surveys differ from one country to the other (e.g. chinese people vs. those coming from western countries tend to highlight specific response pattern, the former having generally more extreme scores at the item level, see e.g. Song, X.-Y. (2007) Analysis of multisample structural equation models with applications to Quality of Life data, in Handbook of Latent Variable and Related Models, Lee, S.-Y. (Ed.), pp 279-302, North-Holland).
More generally, you should look at the psychometric-related literature which makes extensive use of Likert items if you are interested with measurement issue. Various statistical models have been developed and are currently headed under the Item Response Theory framework.
If you designed your questionnaire correctly you have form hypotheses between relations of constructs. For example I think you have implementation success as an Y variable dependent on manager involvement.
Now, you most probably have questions in your questionnaire which measure those variables somehow.
When you have the results of your questions you can make a scale (or index for that matter) to measure your construct. You can do this by using a few methods (be sure to invert negative questions in terms of scale)
When for example you measure implementation success with 5 questions. You may take the mean of those scores (again, rescale negative questions!) assuming they have equal weights for determination of the construct. Also, you could just give them weights by guesstimates.
What's more, you could extract factor scores to have a less arbitrary weighting. All of these methods have their (dis)advantages over the others.
Checking whether questions measure the same thing can be done by using reliability (Cronbach's) alpha. But be sure to know what it does.And also be sure to use other metrics, measures, tests and a healthy dose of face validity and common sense.
Finally you investigate the relationships between your constructs (measured by your developed scales) with for example correlations or regressions.
I hope this helps, good luck!
Best Answer
I don't think you should be multiplying them in this way without a lot of thought.
In fact, I'd go further and say that you shouldn't even ask the question this way. Rather, it would be better to have people rate the risk of each type of injury. After all, if a person falls 10 feet, he MIGHT have no injuries, he MIGHT die - there are certainly examples of both. So, I might say there is a very slight chance of no injuries, a much higher chance of the middle 3 levels and a very slight chance of fatality. And "possible fatal" is a bad choice of words. What does "very likely possible fatal injuries" mean?
If you've already gathered data ..... well... Clearly "no injuries" and "never" should be 0. not 1. Then you need to consider each combination and whether they are equivalent. Is "unlikely fatal" (2*5 = 10) the same as "very likely slight (5*2 = 10)? I don't think so....
The second scale could probably be made numeric fairly easily: Never = 0, Very likely = 0.9 and the others are intermediate with some reasonable choices. The first scale will be very hard to make numeric. I would do sensitivity analysis with different choices.
Then, you don't avoid any of these issues by making it ordinal. If anything, you make them worse. By doing this you are saying that ALL the combinations in a particular ordinal level are the same. So, your 10-15 category includes:
That can't be right!
Another choice, if you've already got data, is to not multiply the values at all but to use them as separate independent variables; but I am not sure that gets at what you want.