If you are writing a paper, the journal probably has a style guide. Often, these show what to report. (This is assuming your question is mostly about which statistics to report and how to report them, not how to do the analysis).
There's a difference between predicting variables and finding out correlation.
Logistic regression is predictor, more specifically, binary classifier. "Classifier" means that it tries to assign some class to every observation. "Binary" means that there are exactly 2 classes. Moreover, logistic regression produces probability with which each observation belongs to each class.
If you want to predict extroversion/introversion, there are 2 options for you:
- Use each of them as a class and give binary answer. This is simple: person will be assigned either "extrovert" or "introvert" label.
- Use fuzzy logic. Logistic regression will give you some number between 0 and 1, which represents how much person belongs to specified class. E.g. if you set introversion to 0 and extroversion to 1, and logistic regression return 0.7, then we can say that person is 70% extrovert and 30% introvert. This one is good for capturing things like ambiversion.
Logistic regression works with both - continuous variables and categorical (encoded as dummy variables), so you can directly run logistic regression on your dataset.
Pearson, on other hand, defines correlation. Correlation is simply normalized covariation, and covariation measures how 2 random variables co-variate, that is, how change in one variable is related to change in another one.
Strictly speaking, Pearson correlation cannot deal with categorical variables (mostly because categorical variables don't have a notion of mean, which Pearson is based on). However, having only 2 binary variables you can consider them as continuous (with values of 1 and 0) and calculate a kind of correlation. This is clearly a hack, but it should work for simple explorational analysis.
Best Answer
1) A logistic regression calculates the probability of an event happening based on the factors you feed into your model, and it uses a logit transform to give you those probabilities. (I will assume that you know this type of regression quite well so I will not go too much into it).
A Cox regression (or Cox Proportional Hazard model) is quite different. It is used to explore the relationship between the 'survival' of a subject and the explanatory variables. It operates like a linear regression except that the response variable $Y$ is the hazard function at a given time $t$. The model takes the form:
$(Y = )\lambda_i(t) = \lambda_0(t)exp(\beta^TX_i)$
where $\lambda_0(t)$ is the baseline hazard which is analogousto the intercept term in linear regression, it corresponds to the probability of your 'event' occurring when all of the explanatory variables are zero. The explanatory variables and regression coefficients are in the form of an exponential function $exp(\beta^TX_i)$ where $\beta$ are the coefficients and $X_i$ are the explanatory variables for an person $i$.
The reasoning behind the proportionality of hazards in this model is the assumption of the consistent relationship between the dependent and explanatory variables, this means that the hazard functions for any two individuals at any point in time are proportional, for example if subject A has a risk of 'event' twice as high as another subject B at time $t$, then subject A will maintain that level of proportionality at all later times $t$.
As you can see, unlike logistic regression, this model is dependent on time, which means the hazard of an 'event' happening changes with time.