This is only really a problem if you compute the precision and recall first, then plug them in.

One can also compute the $F_1$ score as
$$F_1 = \frac{2 \cdot \textrm{True Positive}}{2 \cdot \textrm{True Positive} + \textrm{False Positive} + \textrm{False Negative}}$$

Plugging in your numbers, you'll arrive at an $F_1$score of zero, which seems appropriate since your classifier is just guessing the majority class.

There is an information-theoretic measure called **proficiency** that might be of interest if you are working on fairly unbalanced data sets. The idea is that you want it to remain sensitive to both classes as either the number of true positives or negatives approaches zero. It's essentially $$
\frac{I(\textrm{predicted labels}; \textrm{actual labels})}{H(\textrm{actual labels)}}$$

See pages 5--7 of White et al. (2004) for more details about its calculation and interpretation

You don't seem to want logistic regression at all. What you say is "I would like to maximize the difference between true positives and false positives." That is a fine objective function, but it is not logistic regression. Let's see what it is.

First, some notation. The dependent variable is going to be $Y_i$:

\begin{align}
Y_i &= \left\{ \begin{array}{l}
1 \qquad \textrm{Purchase $i$ was profitable}\\
0 \qquad \textrm{Purchase $i$ was un-profitable}
\end{array}
\right.
\end{align}

The independent variables (the stuff you use to try to predict whether you should buy) are going to be $X_i$ (a vector). The parameter you are trying to estimate is going to be $\beta$ (a vector). You will predict buy when $X_i\beta>0$. For observation $i$, you predict buy when $X_i\beta>0$ or when the indicator function $\mathbf{1}_{X_i\beta>0}=1$.

A true positive happens on observation $i$ when both $Y_i=1$ and $\mathbf{1}_{X_i\beta>0}=1$. A false positive on observation $i$ happens when $Y_i=0$ and $\mathbf{1}_{X_i\beta>0}=1$. You wish to find the $\beta$ which maximizes true positives minus false positives, or:
\begin{equation}
max_\beta \; \sum_{i=1}^N Y_i\cdot\mathbf{1}_{X_i\beta>0} - \sum_{i=1}^N (1-Y_i)\cdot\mathbf{1}_{X_i\beta>0}
\end{equation}

This is not an especially familiar objective function for estimating a discrete response model, but bear with me while I do a little algebra on the objective function:
\begin{align}
&\sum_{i=1}^N Y_i\cdot\mathbf{1}_{X_i\beta>0} - \sum_{i=1}^N (1-Y_i)\cdot\mathbf{1}_{X_i\beta>0}\\
= &\sum_{i=1}^N Y_i\cdot\mathbf{1}_{X_i\beta>0} - \sum_{i=1}^N \mathbf{1}_{X_i\beta>0}
+ \sum_{i=1}^N Y_i\cdot\mathbf{1}_{X_i\beta>0}\\
= &\sum_{i=1}^N Y_i\cdot\mathbf{1}_{X_i\beta>0} - \sum_{i=1}^N \mathbf{1}_{X_i\beta>0}
+ \sum_{i=1}^N Y_i\cdot\mathbf{1}_{X_i\beta>0} \\
& \qquad + \sum_{i=1}^N 1 - \sum_{i=1}^N 1 + \sum_{i=1}^N Y_i - \sum_{i=1}^N Y_i\\
= &\sum_{i=1}^N Y_i\cdot\mathbf{1}_{X_i\beta>0} + \sum_{i=1}^N (1-Y_i)(1-\mathbf{1}_{X_i\beta>0}) - \sum_{i=1}^N 1 + \sum_{i=1}^N Y_i \\
\end{align}

OK, now notice that the last two terms in that sum are not functions of $\beta$, so we can ignore them in the maximization. Finally, we have just shown that the problem you want to solve, "maximize the difference between true positives and false positives" is the same as this problem:
\begin{equation}
max_\beta \; \sum_{i=1}^N Y_i\cdot\mathbf{1}_{X_i\beta>0} + \sum_{i=1}^N (1-Y_i)(1-\mathbf{1}_{X_i\beta>0})
\end{equation}

Now, that estimator has a name! It is named the maximum score estimator. It is a very intuitive way to estimate the parameter of a discrete response model. The parameter is chosen so as to maximize the number of correct predictions. The first term is the number of true positives, and the second term is the number of true negatives.

This is a pretty good way to estimate a (binary) discrete response model. The estimator is consistent, for example. (Manski, 1985, J of Econometrics) There are some oddities to this estimator, though. First, it is not unique in small samples. Once you have found one $\beta$ which solves the maximization, then any other $\beta$ which makes the exact same predictions in your dataset will solve the maximization---so, infinitely many $\beta$s close to the one you found. Also, the estimator is not asymptotically normal, and it converges slower than typical maximum likelihood estimators---cube root $N$ instead of root $N$ convergence. (Kim and Pollard, 1990, Ann of Stat) Finally, you can't use bootstrapping to do inference on it. (Abrevaya & Huang, 2005, Econometrica) There are some papers using this estimator though---there is a fun one about predicting results in the NCAA basketball tournament by Caudill, International Journal of Forecasting, April 2003, v. 19, iss. 2, pp. 313-17.

An estimator that overcomes most of these problems is Horowitz's smoothed maximum score estimator (Horowitz, 1992, Econometrica and Horowitz, 2002, J of Econometrics). It gives a root-$N$ consistent, asymptotically normal, unique estimator which is amenable to bootstrapping. Horowitz provides example code to implement his estimator on his webpage.

## Best Answer

I think you've discovered that the F-score is not a very good way to evaluate a classification scheme. From the Wikipedia page you linked, there is a simplification of the formula for the F-score:

$$ {F1} = \frac {2 {TP}} {2 {TP} + {FP} + {FN}} $$

where $TP,FP,FN$ are numbers of true positives, false positives, and false negatives, respectively.

You will note that the number of true negative cases (equivalently, the total number of cases) is not considered at all in the formula. Thus you can have the same F-score whether you have a very high or a very low number of true negatives in your classification results. If you take your case 1, "# of predicted healthy patients over # of actual healthy patients", the "true negatives" are those who were correctly classified as having cancer yet that success in identifying patients with cancer doesn't enter into the F-score. If you take case 2, "# of predicted cancer patients over # of actual cancer patients," then the number of patients correctly classified as not having cancer is ignored. Neither seems like a good choice in this situation.

If you look at any of my favorite easily accessible references on classification and regression, An Introduction to Statistical Learning, Elements of Statistical Learning, or Frank Harrell's Regression Modeling Strategies and associated course notes, you won't find much if any discussion of F-scores. What you will often find is a caution against evaluating classification procedures based simply on $TP,FP,FN,$ and $TN$ values. You are much better off focusing on an accurate assessment of likely disease status with an approach like logistic regression, which in this case would relate the probability of having cancer to the values of the predictors that you included in your classification scheme. Then, as Harrell says on page 258 of

Regression Modeling Strategies, 2nd edition:A good model of the probability of being a member of a class, in this case of having cancer, is thus much more useful than any particular classification scheme.