Continuous Dependent Variable – Handling Continuous Dependent and Ordinal Independent Variables

lassoordinal-datapredictorrregression

Given a continuous dependent variable y and independent variables including an ordinal variable X1, how do I fit a linear model in R? Are there papers about this type of model?

Best Answer

@Scortchi's got you covered with this answer on Coding for an ordered covariate. I've repeated the recommendation on my answer to Effect of two demographic IVs on survey answers (Likert scale). Specifically, the recommendation is to use Gertheiss' (2013) ordPens package, and to refer to Gertheiss and Tutz (2009a) for theoretical background and a simulation study.

The specific function you probably want is ordSmooth*. This essentially smooths dummy coefficients across levels of ordinal variables to be less different from those for adjacent ranks, which reduces overfitting and improves predictions. It generally performs as well as or (sometimes much) better than maximum likelihood (i.e., ordinary least squares in this case) estimation of a regression model for continuous (or in their terms, metric) data when the data are actually ordinal. It appears compatible with all sorts of generalized linear models, and allows you to enter nominal and continuous predictors as separate matrices.

Several additional references from Gertheiss, Tutz, and colleagues are available and listed below. Some of these may contain alternatives – even Gertheiss and Tutz (2009a) discuss ridge reroughing as another alternative. I haven't dug through it all yet myself, but suffice it to say this solves @Erik's problem of too little literature on ordinal predictors!

References

- Gertheiss, J. (2013, June 14). ordPens: Selection and/or smoothing of ordinal predictors, version 0.2-1. Retrieved from http://cran.r-project.org/web/packages/ordPens/ordPens.pdf.
- Gertheiss, J., Hogger, S., Oberhauser, C., & Tutz, G. (2011). Selection of ordinally scaled independent variables with applications to international classification of functioning core sets. Journal of the Royal Statistical Society: Series C (Applied Statistics), 60(3), 377–395.
- Gertheiss, J., & Tutz, G. (2009a). Penalized regression with ordinal predictors. International Statistical Review, 77(3), 345–365. Retrieved from http://epub.ub.uni-muenchen.de/2100/1/tr015.pdf.
- Gertheiss, J., & Tutz, G. (2009b). Supervised feature selection in mass spectrometry-based proteomic profiling by blockwise boosting. Bioinformatics, 25(8), 1076–1077.
- Gertheiss, J., & Tutz, G. (2009c). Variable scaling and nearest neighbor methods. Journal of Chemometrics, 23(3), 149–151. - Gertheiss, J. & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4, 2150–2180.
- Hofner, B., Hothorn, T., Kneib, T., & Schmid, M. (2011). A framework for unbiased model selection based on boosting. Journal of Computational and Graphical Statistics, 20(4), 956–971. Retrieved from http://epub.ub.uni-muenchen.de/11243/1/TR072.pdf.
- Oelker, M.-R., Gertheiss, J., & Tutz, G. (2012). Regularization and model selection with categorial predictors and effect modifiers in generalized linear models. Department of Statistics: Technical Reports, No. 122. Retrieved from http://epub.ub.uni-muenchen.de/13082/1/tr.gvcm.cat.pdf.
- Oelker, M.-R., & Tutz, G. (2013). A general family of penalties for combining differing types of penalties in generalized structured models. Department of Statistics: Technical Reports, No. 139. Retrieved from http://epub.ub.uni-muenchen.de/17664/1/tr.pirls.pdf.
- Petry, S., Flexeder, C., & Tutz, G. (2011). Pairwise fused lasso. Department of Statistics: Technical Reports, No. 102. Retrieved from http://epub.ub.uni-muenchen.de/12164/1/petry_etal_TR102_2011.pdf.
- Rufibach, K. (2010). An active set algorithm to estimate parameters in generalized linear models with ordered predictors. Computational Statistics & Data Analysis, 54(6), 1442–1456. Retrieved from http://arxiv.org/pdf/0902.0240.pdf?origin=publication_detail.
- Tutz, G. (2011, October). Regularization methods for categorical data. Munich: Ludwig-Maximilians-Universität. Retrieved from http://m.wu.ac.at/it/departments/statmath/resseminar/talktutz.pdf.
- Tutz, G., & Gertheiss, J. (2013). Rating scales as predictors—The old question of scale level and some answers. Psychometrika, 1-20.