@Scortchi's got you covered with this answer on Coding for an ordered covariate. I've repeated the recommendation on my answer to Effect of two demographic IVs on survey answers (Likert scale). Specifically, the recommendation is to use Gertheiss' (2013) ordPens package, and to refer to Gertheiss and Tutz (2009a) for theoretical background and a simulation study.
The specific function you probably want is ordSmooth
*. This essentially smooths dummy coefficients across levels of ordinal variables to be less different from those for adjacent ranks, which reduces overfitting and improves predictions. It generally performs as well as or (sometimes much) better than maximum likelihood (i.e., ordinary least squares in this case) estimation of a regression model for continuous (or in their terms, metric) data when the data are actually ordinal. It appears compatible with all sorts of generalized linear models, and allows you to enter nominal and continuous predictors as separate matrices.
Several additional references from Gertheiss, Tutz, and colleagues are available and listed below. Some of these may contain alternatives – even Gertheiss and Tutz (2009a) discuss ridge reroughing as another alternative. I haven't dug through it all yet myself, but suffice it to say this solves @Erik's problem of too little literature on ordinal predictors!
References
- Gertheiss, J. (2013, June 14). ordPens: Selection and/or smoothing of ordinal predictors, version 0.2-1. Retrieved from http://cran.r-project.org/web/packages/ordPens/ordPens.pdf.
- Gertheiss, J., Hogger, S., Oberhauser, C., & Tutz, G. (2011). Selection of ordinally scaled independent variables with applications to international classification of functioning core sets. Journal of the Royal Statistical Society: Series C (Applied Statistics), 60(3), 377–395.
- Gertheiss, J., & Tutz, G. (2009a). Penalized regression with ordinal predictors. International Statistical Review, 77(3), 345–365. Retrieved from http://epub.ub.uni-muenchen.de/2100/1/tr015.pdf.
- Gertheiss, J., & Tutz, G. (2009b). Supervised feature selection in mass spectrometry-based proteomic profiling by blockwise boosting. Bioinformatics, 25(8), 1076–1077.
- Gertheiss, J., & Tutz, G. (2009c). Variable scaling and nearest neighbor methods. Journal of Chemometrics, 23(3), 149–151.
- Gertheiss, J. & Tutz, G. (2010). Sparse modeling of categorial explanatory variables.
The Annals of Applied Statistics, 4, 2150–2180.
- Hofner, B., Hothorn, T., Kneib, T., & Schmid, M. (2011). A framework for unbiased model selection based on boosting. Journal of Computational and Graphical Statistics, 20(4), 956–971. Retrieved from http://epub.ub.uni-muenchen.de/11243/1/TR072.pdf.
- Oelker, M.-R., Gertheiss, J., & Tutz, G. (2012). Regularization and model selection with categorial predictors and effect modifiers in generalized linear models. Department of Statistics: Technical Reports, No. 122. Retrieved from http://epub.ub.uni-muenchen.de/13082/1/tr.gvcm.cat.pdf.
- Oelker, M.-R., & Tutz, G. (2013). A general family of penalties for combining differing types of penalties in generalized structured models. Department of Statistics: Technical Reports, No. 139. Retrieved from http://epub.ub.uni-muenchen.de/17664/1/tr.pirls.pdf.
- Petry, S., Flexeder, C., & Tutz, G. (2011). Pairwise fused lasso. Department of Statistics: Technical Reports, No. 102. Retrieved from http://epub.ub.uni-muenchen.de/12164/1/petry_etal_TR102_2011.pdf.
- Rufibach, K. (2010). An active set algorithm to estimate parameters in generalized linear models with ordered predictors. Computational Statistics & Data Analysis, 54(6), 1442–1456. Retrieved from http://arxiv.org/pdf/0902.0240.pdf?origin=publication_detail.
- Tutz, G. (2011, October). Regularization methods for categorical data. Munich: Ludwig-Maximilians-Universität. Retrieved from http://m.wu.ac.at/it/departments/statmath/resseminar/talktutz.pdf.
- Tutz, G., & Gertheiss, J. (2013). Rating scales as predictors—The old question of scale level and some answers. Psychometrika, 1-20.
Best Answer
I think you may be mixing different issues in the same package. First things first. As @whuber has just pointed, it is not clear whether your DV is really ordinal, or only your IVs. That would configure very different scenarios, depending on such clarification. For instance, if your DV is also ordinal, you could (and maybe even should) go for a Ordinal Logistic Regression, just as mentioned by @Scortchi. The skewed distribution of such a DV would be less than a problem in that framework.
Actually, being precise, the distribution of even a continuous DV is, by itself, less of a problem. What you should be more worried about is the distribution of the residuals of your whole model, not of the variables themselves. It is a common mistake to pay attention to the distribution of the variables not of the residuals.
So, if your DV is continuous and the distribution of the residuals is heavily skewed, you should try first to transform your DV (log transformation being the standard, but in some cases you may need to power the DV or to square root it). If no transformation of the DV works, try also transforming any continuous IV. If you still have no good luck here, I would go for a robust regression, which allows for much less well behaved distribution of residuals. It is fairly usefull, for instance, when you have strong bell-shaped distribution of the residuals, i.e. heavy tails in both sides (a scenario that may be very hard to solve through transformation of variables). In R, it can be accomplished very well with lmrob function.
But anyway, at the end of your question, you mention that one third of the DV observations are equal zero. This case may also suggest that you may have a zero-inflated scenario and, thus, may need to look for models that can account for this kind of issue, such as the Zero-Inflated Regression models.
Hope this helps, but any further insights depend on you better clarifying your variables and the specification of your models.