Solved – Looking for ways to compare between coxph models

aicepidemiologyrsurvival

I'm running Cox proportional hazards regression in R, and would like to test the option of categorizing one of my continuous variables to factor (I'm aware of the loss of data issue, just checking).

Another thing that I'd like to check is putting the difference between 2 cont. variables inside the regression instead of putting them both.
[It's actually testing if the pulse pressure, e.g, the difference between the systolic and diastolic blood pressure is more significant than each of them separately]

My question is: What is the best way to compare between the different variations of the regressions (lets assume that I'll use step() in each of the attempts). No missing values in the dataframe whatsoever.

I'm pretty confused between AIC, R2 (of coxph) and concordance of coxph.
Can anyone clear things up for me? Is there any other option of comparing between different models on the same data?

Thanks!

Best Answer

I certainly wouldn't use step() here (actually, I would strongly prefer not to use it anywhere). Here I would force entry of all the variables.

Then I would use predict to get predicted values from each model and compare them to the actual values, probably through a scatter plot. I might also look at the differences between the predicted and actual values, see the range for each model, perhaps make parallel box plots, then maybe a t-test or something like that.

Related Solutions

Solved – Creating formula object for coxph()

Your issue arise from how formula.data.frame (the method associated with data.frames) works and how data.frame(cbind(...)) strips the Surv object of the Surv class attribute.

What you want is

 mod.los <- coxph(Surv(length_of_stay, exited_care)~ gender, data = mydata)

Or perhaps

  mod.los <- coxph(Surv(time,status) ~ gender, data = data_frame_for_formula)

R Wilcoxon Signed Rank Test – Understanding One-Tailed Wilcoxon Signed Rank Test Output in R

What does having infinity as the upper bound of a confidence interval mean? Is this because I'm using the one-tailed version of the test?

Yes, it's because you're doing a one-tailed version of the test; no matter how far the sample location is in the 'wrong' direction (i.e. the direction inconsistent with the alternative), it's still consistent with the null - so you're only considering one-sided bounds.

would that mean I would be justified in saying "with a 95% confidence x[,5]'s mean will be within -72 of x[,6]'s?"

No it wouldn't justify that statement. For starters you're not testing means at all unless you make some additional assumptions that would make difference in means coincide with the population equivalent of the location-shift estimate for the test.

In the second place, the location-difference could be in the 'wrong' direction, so 'within' doesn't quite work either.

In the third place, two locations aren't normally considered to be 'within' a negative distance of each other.

You could say something like "the estimated improvement from the first to the second algorithm was 21" (and then give the units!). Note that I said 21 and not 72. If you explain to the reader what the pseudo-median of the differences is, you can give more detail about what this difference is measuring.

What does the V value mean with regard to my data?

It's the value of the Signed Rank statistic. Check the references mentioned below for how it's calculated (particularly Hollander & Wolfe if you can find it since that's the references given in the R help, so the statistic is sure to correspond).

Specifically, the two main definitions that I've seen are either that all signed ranks are added (this is the version on the Wikipedia page), OR that only the positive-signed ranks are added. It looks like R uses the second one. That is, if $x$ and $y$ are the two paired samples, so the differences $x-y$ are tested, then

 sum(rank(abs(x-y))[x>y])

should give the same statistic as R. Like so:

> sum(rank(abs(x[,5]-x[,6]))[x[,5]>x[,6]])
[1] 22

From what I can see it is the difference between median(x[,5]) and median(x[,6]

It isn't. Well, they might coincide occasionally (as with your sample) but that's not what is going on. You should probably start by reading up about how the statistic works. I'd suggest something like Conover's Practical Nonparametric Statistics. Or, ideally, you could check the Signed Rank Test reference in the R help on wilcox.test (Hollander & Wolfe).

The actual value of the statistic isn't usually of interest. The estimate of the size of the location-shift would be relevant (and doesn't depend on which definition of the statistic is used). That is, the fact that 0 is inside the interval matters a lot, the "-21" matters somewhat, the "-72" might matter, the "22" probably doesn't (though there's little harm in quoting it if the definition of the statistic is clear to the reader).

Best Answer

Related Solutions

Solved – Creating formula object for coxph()

R Wilcoxon Signed Rank Test – Understanding One-Tailed Wilcoxon Signed Rank Test Output in R

Related Question