I have a timeseries dataset holding stock data for a large set of companies. Assume the following subset, where obsDay
is the observation day (148 days in reality) and weekDay
the day of the week (1 = monday, 2 = tuesday, …):
company sector obsDay weekDay stockPrice
-------------------------------------------
1 15 1 3 10.40
1 15 2 4 9.42
1 15 3 5 9.66
1 15 4 1 11.00
1 15 5 2 10.21
2 10 1 3 43.55
2 10 2 4 43.50
2 10 3 5 40.31
2 10 4 1 48.43
2 10 5 2 43.00
3 20 1 3 10.00
3 20 2 4 11.00
3 20 3 5 12.00
3 20 4 1 13.00
3 20 5 2 14.00
In an OLS regression, I would like to see the effect, for instance, of day of the week and industry sector, on stockprice. It is suggested that I would then have to make dummies for each variable, as follows (for sector):
company sector sector15 sector10 sector20 obsDay weekDay stockPrice
------------------------------------------------------------------------
1 15 1 0 0 1 3 10.40
1 15 1 0 0 2 4 9.42
(and the same for weekDay). But with the many variables I have, this is making the dataset hard to read. Can't SPSS create dummy variables from the sector
and weekDay
categories on the fly when performing the OLS?
Best Answer
If you have the advanced statistics package that allows you do estimate generalized linear models (see the menus
Analyze -> Generalized Linear Models
or theGENLIN
command), you can have SPSS on the fly generate the dummy variables. Given your data it may be good to see if some of the newer mixed model commands can estimate auto-regressive components for panel data.Alternatively, you can use the
DO REPEAT
syntax to efficiently generate your dummy variables for use in regression equations. For instance, for yourweekDay
variable it would be;As long as your variables are in a sequential list of integer values, the do repeat command will work (if they aren't in a sequential list see the
AUTORECODE
command). Then in the linear regression command you can subsequently use theTO
operator to specify a list of variables that are in sequential order in your dataset (extra note it has to do with the order of the variables in the dataset, nothing to do with the names directly).Below I have an example.
Another extension command with more flexibility for writing dummy variables, SPSSINC_CREATE_DUMMIES (written in Python) is on the developerworks site (but I have not used it). Also one of the members here, ttnphns, has some tools to accomplish similar tasks on his site. Given your example though a few do repeat commands should be sufficient.