[Math] Can someone please explain 3-CNF for me

conjunctive-normal-formlogicpropositional-calculus

I was reading a textbook today and stumbled upon 3-CNF: 3-conjunctive normal form.
Apparently, it's a conjunctive form, where every OR clause has at most 3 variables.

Could someone please explain

  • What is the usage of 3CNF?
  • How can one convert CNF to 3CNF? How can one convert "normal" formula to 3CNF?
  • Why does 3CNF has at most 24 times as many occurrences of variables as original formula?

I tried googling but only found some articles dealing with 3CNF in the context of SAT-solving.

Thanks in advance and sorry for any grammar mistakes I made.

Best Answer

The point of 3CNF is that it is a "normal form" for formulas: as you mention, every formula is equivalent, up to a quadratic (linear?) blowup, to a 3CNF formula. 3CNF formulas are "simple", and so more easy to deal with. In particular, if you ever read about NP-completeness, you will find out that we want our to put our "challenging problems" in as simple a form as possible. This makes it easier both to design and analyze algorithms solving these problems, and to prove that other problems are also difficult by reducing 3CNF to them (showing how to solve 3CNF using an algorithm for them).

We care specifically about 3CNF for historical reasons, it was the first (or one of the first) NP-complete problems (check out Cook's paper or Karp's paper on their respective pages). Also, 2CNF is not "complete" (arbitrary formulas cannot are not equivalent to a 2CNF), and it is easy to determine whether they are satisfiable or not (google if interested).

The conversion from CNF to 3CNF is best explained by an example. We convert each clause separately. The clause $A \lor B \lor C \lor D \lor E$ is equivalent to the 3CNF $$(A \lor B \lor x_1) \land (\lnot x_1 \lor C \lor x_2) \land (\lnot x_2 \lor D \lor E), $$ in the sense that the original formula (in this case, a single clause) is satisfiable iff the converted formula is. You convert each of the clauses, and take their conjunction.

Suppose you have an arbitrary formula using the connectives $\lnot,\lor,\land$. Picture it as a tree, where inner nodes are labeled with $\lnot,\lor,\land$, and each node has either one ($\lnot$) or two ($\lor,\land$) children. I'm sorry I can't provide a picture. The first step is "pushing the negations to the leaves" using de Morgan's rules (q.v.). That rids us of all $\lnot$ nodes, but now leaves may be literals (variables or negations of variables). Now we convert the formula to a CNF recursively.

Denote by $\varphi$ the function we're constructing, that takes a formula as above (with $\land,\lor$ and possibly negated variables) and returns a CNF. For the base case, we have $\varphi(x) = x$ and $\varphi(\lnot x) = \lnot x$. For a formula of the form $A \land B$, we don't have to work hard: we define $\varphi(A \land B) = \varphi(A) \land \varphi(B)$. The hardest case is formulas of the form $A \lor B$: a somewhat economical choice is $$\varphi(A \lor B) = (y \lor \varphi(A)) \land ((\lnot y) \lor \varphi(B)), $$ where $y$ is a new variable, and $y \lor \varphi(A)$ means adding $y$ to all the clauses.

General formulas with arbitrary connectives can be handled by expressing the connectives using $\lor,\land,\lnot$, for example by writing a truth-table; this can be quite wasteful, though.

The entire construction results in a quadratic blowup in formula size (your "occurrences of variables"; there are many other ways to define formula size). However, the result you're quoting is only a linear blowup (indeed, by a factor of $24$). It is entirely possible that such a construction exists, but I'm not aware of it; perhaps one of the readers is. The SAT to 3SAT part has a linear blowup with a factor of $3$.


If you do not allow adding new variables, then no simple conversion is possible. While it is always possible to convert an arbitrary formula into a CNF (using the "truth table" approach), not every formula is convertible to a 3CNF without adding new variables. An example is parity on $4$ variables. Also, the conversion from an arbitrary formula to a CNF can result in exponential blow-up, for example for parity on $n$ variables we go from $\Theta(n^2)$ to $\Theta(n2^n)$.