Solved – Is it possible to directly read CSV columns as categorical data

categorical datadata transformationr

I need to analyze with R the data from a medical survey (with 100+ coded columns) that comes in a CSV. I will use rattle for some initial analysis but behind the scenes it's still R.

If I read.csv() the file, columns with numerical codes are treated as numerical data. I'm aware I could create categorical columns from them with factor() but doing it for 100+ columns is a pain.

I hope there is a better way to tell R to import the columns directly as factors. Or to at least to convert them in place afterwards.

Thank you!

Best Answer

You can use the colClasses argument to specify the classes of your data columns. For example:

data <- read.csv('foo.csv', colClasses=c('numeric', 'factor', 'factor'))

will assign numeric to the first column, factor to the second and third. Since you have so many columns, a shortcut might be:

data <- read.csv('foo.csv', colClasses=c('numeric', rep('factor', 37), 'character'))

or some such variation (i.e. assign numeric to first column, factor to next 37 columns, then character to the last one).