Solved – Partial correlation plot, split by groups SPSS

partial-correlationrscatterplotspss

Is it possible to illustrate partial correlation scatter plots for 2 subgroups on the same graph?

e.g. I want to make scatter plots of data controlled for age, differentiated by males or females.

I've tried doing partial regression plots generated by linear regression analysis, but I can't split it by groups.

Options to do it in excel or R would be fine too. Thanks!

Best Answer

Here is some code for how you could do it in R:

#Psuedo Data
group1 = rnorm(100)
group2 = rnorm(100)
response = rnorm(100)

#Plotting
plot(group1,response,xlim=c(min(group1,group2),max(group1,group2)),pch=21,bg="deepskyblue",
     ylab="Response Variable",xlab="Independent Variable")
points(group2,response,pch=21,bg="red")

#Adding a legend if desired
legend("topright",c("Group 1","Group 2"),pch=19,col=c("deepskyblue","red"),bg="white")

Which then generates the following plot: enter image description here

You could obviously add your own aesthetic touched but this should give you the general idea.

Related Solutions

R Regression – Verification of a Regression Model

It seems to me that a first step would be to try to create some models of how tcp header data might relate to your categories. That is, do you have any theories?

If you do, it might turn out that you need to preprocess your packet info: for example using the window size of the previous packet rather than the current one, or the using the day of the week instead of the day of the month.

Then you need to look carefully at your inputs and outputs. Are they categorical ("car", "truck"), ordered categorical ("small", "medium", "large"), etc? Your linear regression is probably treating your categories like they're continuous (1..N) and your plot shows there's no such linear relationship -- and there's probably no reason to expect there should be.

Once you have an idea of models that might make sense, have meaningful variables, and know the types of these variables, methods will naturally fall into place. (For example, continuous variables in and binary category out naturally suggests logistic regression.)

EDIT: In terms of logistic regression, it can be used with multiple outcomes. Look for multinomial logistic regression.

In terms of validation, you train your model with your training set then predict on the validation data and see how accurate you are. Obviously, if you look at your accuracy on your training data, it'll tend to overestimate your accuracy since it's what you tuned your model to. A better test of how you'll do in the real world is to use data that your tuning (training) process never used.

R – How to Find Difference in Means at Specific Points in a By-Factor GAMM

I believe this

... and there is no point at which the CI excludes 0, suggesting the means are the same at every point

is one source of confusion.

On the face of it, the data and smooths suggest that the difference of means is not exactly zero. Put another way, whatever difference in means there is between the two groups is small relative to the uncertainty in the estimates of those mean. You observed a difference but cannot say with any level of confidence that this observed difference is a real difference in the population of subjects you might have sampled.

Why are each of the model terms significant but the difference not significant? This comes from propagating the uncertainty in all of the things you estimated into the computation of the difference of means as a function of age.

Controlling for the smooth effect of Age by sex, you detect a small difference between the two groups in the mean response. The two smooth terms were assessed to be unlikely to have arise if the true functions were flat constant functions.

Computing a difference of means as a function of age requires you to bring together all these estimated effects and propagate their uncertainties into your estimates of the difference in the response between the two groups as a function of age.

You could try a more direct estimation of the thing you are interested in by using the ordered-factor method.

data <- transform(data, oSex = ordered(Sex))
m <- gam(Weight ~ oSex + s(Age) + s(Age, by = oSex) + s(Subject, bs = "re"),
         method = "REML"

where s(Age) is the smooth effect of Age for the reference level of Sex, while s(Age, by = oSex) is the smooth difference between the smooth effect of Age in the reference level and the other level of Sex.

This shouldn't change the outcome much though; I suspect you'll see a small effect of Sex overall once you condition on the smooth effects of Age in both groups, but you won't find large ("significant") differences when you ask the more specific question of "At what ages are the groups different?".

Best Answer

Related Solutions

R Regression – Verification of a Regression Model

R – How to Find Difference in Means at Specific Points in a By-Factor GAMM

Related Question