I would say that there might not be any particular or only one measure which you should take into account.
Last time when I did probabilistic classification I had a R package ROCR and explicit cost values for the False Positives and False Negatives.
I considered all cutoff-points from 0 to 1 and used many measures such as expected cost when selecting this cutoff - point. Of course I had already AUC measure for the general measure of classifying accuracy. But for me this was not the only possibility.
Values for the FP and FN cases must come outside your particular model, maybe these are provided by some subject matter expert?
For example in customer churn analysis it might be more expensive to incorrectly infer that customer is not churning but also that it will be expensive to give a general reduction in prices for services without accurary to target these to correct groups.
-Analyst
Definitions
Suppose this is a binary classification task, where your model estimates $\mathbb{P}(y_i=1 | x_i)$ for $y\in\{0,1\}$ and $x_i \in \mathbb{R}^p$.
Sensitivity and specificity characterize the true positive rate and true negative rate at some threshold $t$. This means that if you choose a different $t$, you'll have a different sensitivity & specificity.
The $c$-statistic is also known as the area under the ROC curve. The ROC curve plots the true positive rate on the vertical axis and the false positive rate on the horizontal axis. In other words, the ROC curve is a plot where each point is an estimate $\text{sensitivity}(t)$ and $1 - \text{specificity}(t)$ for all values $t$. More compactly, we could write that the curve is constructed from tuples $(\text{FPR}(t), \text{TPR}(t))$, which also emphasizes the dependence on $t$. A useful property of the $c$-statistic is that it estimates the probability that a randomly-selected positive has a higher score than a randomly-selected negative.
Accuracy is the fraction of correctly-classified instances at some threshold $t$, so accuracy likewise varies with the choice of $t$! In the binary case, it's common for people to arbitrarily pick $t=0.5$ but this is exactly that -- arbitrary.
Inferences
These statistics measure different things, so it is not necessarily surprising that one model can have a better score in one respect and a lower score in another.
For some toy examples, the effect can be wildly counter-intuitive. Suppose that your sample is balanced and your model gives estimates for all positives at 0.49 and all negatives at 0.48. All positives are ranked higher than negatives, so the $c$-statistic (ROC AUC) is 1.0. But the accuracy at $t=0.5$ is 0.5 because the sample is balanced, and only the negatives are correctly classified. Moreover, if you change the class composition (but the scores for the classes stay the same), you can arbitrarily change the accuracy, but the AUC will still be 1.0!
Moreover, the sensitivity and specificity statistics just characterize performance at single choices of threshold. Different choices of threshold achieve different trade-offs, so they might be preferable for some particular circumstance.
Experimental Results
Model 1 has higher accuracy at $t_0$ and $c$-statistic than model 2.
Model 2 has higher PPV, NPV and sensitivity and specificity at $t_1$ than model 1.
Is $t_0 = t_1$? That information is not stated. But we do know that the ROC curve for Model 2 has a point with the specified sensitivity and 1 - specificity for the threshold $t_1$. On the other hand, the total area of that ROC Curve is smaller than for Model 1. How is this possible? Just draw monotonic curves passing through the three points we know must be on the ROC curve: $(0,0)$, $(1,1)$, and $(\text{FPR}(t_1), \text{TPR}(t_1))$. You can make the curve have a large or small area, depending on your choice.
What does all of this mean? You'll have to make a choice about what kind of trade-offs you're willing to accept. Do you care about sensitivity and specificity at $t_1$? At $t_0$? Or would you prefer the TPR and FPR at a different value $t$ altogether? I can't answer that.
Some related questions:
Best Answer
You generally know TP, FN, FP, and TN, so based on this wiki:
Po = (TP + TN) / (TP + TN + FP + FN),
Pe = ((TP + FN) * (TP + FP) + (FP + TN) * (FN + TN)) / (TP + TN + FP + FN)^2
Kappa = (Po - Pe) / (1 - Pe)
Our friend Wolfram can then help to simplify this, leading to:
Kappa = 2 * (TP * TN - FN * FP) / (TP * FN + TP * FP + 2 * TP * TN + FN^2 + FN * TN + FP^2 + FP * TN)
So in R, the function would be: