Let's think about regular linear regression, and to make it concrete, let's say we are trying to predict height of people. When you regress heights against just an intercept term and no predictors, the intercept term will be be the height averaged over all the people in your sample. Lets call this term $\beta_0^{\text{no predictor}}$
Now, we want to add a predictor for sex, so we create and indicator variable that takes a 0 when the sampled person is male and 1 when the person is a female. When we regress against this model, we will get an estimates for an intercept term, $\beta_0^{\text{male reference}}$ and coefficent of the sex variable $\beta_1^{\text{male reference}}$. The estimated intercept is no longer the average height of everybody, but the average height of males, the coefficient of the sex variable is the difference in the average height between males and females.
Consider if we decided to code our indicator variable differently, so that the sex variable took the value 0 if the person was a female and 1 if the person was a male, in this specification of the model we get the estimates of the intercept and coefficient $\beta_0^{\text{female reference}}, \beta_1^{\text{female reference}}$. Now $\beta_0^{\text{female reference}}$, the intercept term, is the average height of females, and the coefficient is the difference in average height between females and males. So
$$
\begin{align}
\beta_1^{\text{male reference}} &= -\beta_1^{\text{female reference}}\\
\beta_0^{\text{male reference}} + \beta_1^{\text{male reference}} &= \beta_0^{\text{female reference}}\\
\beta_0^{\text{female reference}} + \beta_1^{\text{female reference}} &= \beta_0^{\text{male reference}}
\end{align}
$$
So, by changing how we coded the indicator variable we changed both the value of the intercept term the coefficient term, and this is exactly what we should want. When we have a multivalue indicator, you will see the same kinds of changes as you specify difference reference levels, i.e. when the indicators take on the value of 0.
In the binary indicator case the p-value of the $\beta_1$ term should not change depending on how we code, but in the multivalue indicator case it will, because p-value is a function of the size of the effect, and the average differences between groups and a reference group will likely change dependent upon the reference group. For example, we have three groups, babies, teenagers, and adults, the average height difference between adults and teenagers will be smaller than between adults and babies, and so the p-value for the coefficient for the indicator of being an adult versus a teenager should be greater than an indicator of being an adult versus a baby.
First off, are your two independent variables being adjusted as factors or numerically coded responses and is there an interaction term for the two? The reason I ask is because the test of proportional odds grows very sensitive with small cell counts. For this reason, I often find it justifiable to adjust input variables as their ordinally coded values (1: poor, 2: fair-to-poor, etc.). Doing so allows information to be borrowed across groups, proportionality is assessed so that an associated difference in the odds of a more favorable response comparing units differing by 1 in the predictor are consistent with odds of an even more favorable response (the rough and contrived interpretation of the test of proportional odds).
If your numeric coding still fails to give valid proportionality, it is possible to get consistent cumulative odds ratios estimates by collapsing adjacent categories like the two bottom box responses.
Thirdly, another powered test of association between an ordinal response and two ordinal factors is a plain old linear regression model. Using robust standard errors, you get valid confidence intervals despite the distribution of the errors. This tends to be less powerful that categorical methods, but with fewer pitfalls due to zero cell counts.
Lastly, as a comment, robust standard errors allow consistent estimation of the mean model in most circumstances. I'm not sure if these are implemented in SPSS, but R and SAS use these frequently. As with the proportional hazards assumption in the Cox model, when this "model based assumption check" fails, it does not mean the model results are entirely invalid, it's just that the effect estimates are "averaged" over their inconsistent proportionality. For instance, if proportional odds model has excessive numbers of respondents giving top box responses, and a predictor shows a large association for the top box response but smaller association for other cumulative measures, then you'll find that the cumulative odds ratio is a weighted combination of the several thresholded odds ratios, with a higher weight placed upon the top box OR.
Best Answer
You shouldn't use stepwise for any kind of model building. Stepwise results have parameters biased away from 0, standard errors and p-values that are too small, models that are too complex - all in ways that are difficult if not impossible to control.
In ordinal regression, as in any other type, the best way to build a model is to use substantive knowledge. Barring that, for the main effects and the interactions, you should look at effect sizes (you can do this effectively with categorical IVs by outputting the predicted value for each combination of IVs).
In some cases, model averaging can be a good solution (especially if your main goal is prediction rather than explanation). If you must use an automatic procedure, use one that penalizes for the complexity of the model (e.g Lasso, LAR).