Solved – SPSS dumthe variables in OLS

categorical dataleast squaresregressionspss

I have a timeseries dataset holding stock data for a large set of companies. Assume the following subset, where obsDay is the observation day (148 days in reality) and weekDay the day of the week (1 = monday, 2 = tuesday, …):

company  sector  obsDay weekDay  stockPrice
-------------------------------------------
1        15      1      3        10.40
1        15      2      4         9.42
1        15      3      5         9.66
1        15      4      1        11.00
1        15      5      2        10.21
2        10      1      3        43.55
2        10      2      4        43.50
2        10      3      5        40.31
2        10      4      1        48.43
2        10      5      2        43.00
3        20      1      3        10.00
3        20      2      4        11.00
3        20      3      5        12.00
3        20      4      1        13.00
3        20      5      2        14.00

In an OLS regression, I would like to see the effect, for instance, of day of the week and industry sector, on stockprice. It is suggested that I would then have to make dummies for each variable, as follows (for sector):

company  sector  sector15  sector10  sector20 obsDay weekDay  stockPrice
------------------------------------------------------------------------
1        15      1         0         0        1      3        10.40
1        15      1         0         0        2      4         9.42

(and the same for weekDay). But with the many variables I have, this is making the dataset hard to read. Can't SPSS create dummy variables from the sector and weekDay categories on the fly when performing the OLS?

Best Answer

If you have the advanced statistics package that allows you do estimate generalized linear models (see the menus Analyze -> Generalized Linear Models or the GENLIN command), you can have SPSS on the fly generate the dummy variables. Given your data it may be good to see if some of the newer mixed model commands can estimate auto-regressive components for panel data.

Alternatively, you can use the DO REPEAT syntax to efficiently generate your dummy variables for use in regression equations. For instance, for your weekDay variable it would be;

VECTOR weekDay_Dummy(7,F1.0).
DO REPEAT weekDay_Dummy = weekDay_Dummy1 to weekDay_Dummy7 /i = 1 to 7.
    DO IF weekDay = i. 
        COMPUTE weekDay_Dummy = 1.
    ELSE IF weekDay <> i.
        COMPUTE weekDay_Dummy = 0.
    END IF.
END REPEAT.

As long as your variables are in a sequential list of integer values, the do repeat command will work (if they aren't in a sequential list see the AUTORECODE command). Then in the linear regression command you can subsequently use the TO operator to specify a list of variables that are in sequential order in your dataset (extra note it has to do with the order of the variables in the dataset, nothing to do with the names directly).

Below I have an example.

data list free / company  sector  obsDay weekDay  stockPrice.
begin data
1        15      1      3        10.40
1        15      2      4         9.42
1        15      3      5         9.66
1        15      4      1        11.00
1        15      5      2        10.21
2        10      1      3        43.55
2        10      2      4        43.50
2        10      3      5        40.31
2        10      4      1        48.43
2        10      5      2        43.00
3        20      1      3        10.00
3        20      2      4        11.00
3        20      3      5        12.00
3        20      4      1        13.00
3        20      5      2        14.00
end data.
dataset name examp.

VECTOR weekDay_Dummy(7,F1.0).
DO REPEAT weekDay_Dummy = weekDay_Dummy1 to weekDay_Dummy7 /i = 1 to 7.
    DO IF weekDay = i. 
        COMPUTE weekDay_Dummy = 1.
    ELSE IF weekDay <> i.
        COMPUTE weekDay_Dummy = 0.
    END IF.
END REPEAT.

REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10)
  /NOORIGIN 
  /DEPENDENT stockPrice
  /METHOD=ENTER weekDay_Dummy2 to weekDay_Dummy5.

Another extension command with more flexibility for writing dummy variables, SPSSINC_CREATE_DUMMIES (written in Python) is on the developerworks site (but I have not used it). Also one of the members here, ttnphns, has some tools to accomplish similar tasks on his site. Given your example though a few do repeat commands should be sufficient.

Best Answer

Related Solutions

Solved – How to perform regression on panel-data with timelag in SPSS / PASW

Solved – What to do with very low Durbin-Watson

Related Question