First off, are your two independent variables being adjusted as factors or numerically coded responses and is there an interaction term for the two? The reason I ask is because the test of proportional odds grows very sensitive with small cell counts. For this reason, I often find it justifiable to adjust input variables as their ordinally coded values (1: poor, 2: fair-to-poor, etc.). Doing so allows information to be borrowed across groups, proportionality is assessed so that an associated difference in the odds of a more favorable response comparing units differing by 1 in the predictor are consistent with odds of an even more favorable response (the rough and contrived interpretation of the test of proportional odds).
If your numeric coding still fails to give valid proportionality, it is possible to get consistent cumulative odds ratios estimates by collapsing adjacent categories like the two bottom box responses.
Thirdly, another powered test of association between an ordinal response and two ordinal factors is a plain old linear regression model. Using robust standard errors, you get valid confidence intervals despite the distribution of the errors. This tends to be less powerful that categorical methods, but with fewer pitfalls due to zero cell counts.
Lastly, as a comment, robust standard errors allow consistent estimation of the mean model in most circumstances. I'm not sure if these are implemented in SPSS, but R and SAS use these frequently. As with the proportional hazards assumption in the Cox model, when this "model based assumption check" fails, it does not mean the model results are entirely invalid, it's just that the effect estimates are "averaged" over their inconsistent proportionality. For instance, if proportional odds model has excessive numbers of respondents giving top box responses, and a predictor shows a large association for the top box response but smaller association for other cumulative measures, then you'll find that the cumulative odds ratio is a weighted combination of the several thresholded odds ratios, with a higher weight placed upon the top box OR.
Best Answer
It's true, in general, that "calibration" tests will behave differently under a different set of regression adjustments. A lot of statisticians will begin and end fishing expeditions by choosing the magic set of adjustments that leads to a clean pass for model based assumptions. But adjustments also change the scientific question, and shouldn't be chosen on the basis of meeting "calibration" requirements. The proportional odds test is just such a goodness of fit test. Here, I mean calibration as some quasi-counterpart to goodness of fit where significance means rejection of model based assumptions. Of course, in large sample sizes almost all calibration tests fail not because the model is invalid but because the test is overpowered and a 0.05 significance level is arbitrary and unuseful.
With a larger number of adjustments in the model, the proportional odds model tends toward a saturated model where each stratum specific probability is close or identical to the fitted values. This is a problem when the data structure is sparse.
One way around the issue of non-proportional odds is to just fit the log-linear model. This is the most general form of analysis of categorical data. With this you can summarize any number of summaries: the multinomial model, or odds ratios predicting a response for a cumulatively higher response for each reference category.