[Math] Tseytin transformation example

boolean-algebraconjunctive-normal-formlogicpropositional-calculus

I am trying to understand Tseytin transformation and I can't really find any reliable info on the internet.
I pretty much understand the steps until I get to the point I have to convert all substitutions to CNF.

If I have this formula (wikipedia example)
$\phi :=((p\lor q)\land r)\to (\neg s)$

then substitued formula would be:
${T(\phi ):=(x1\leftrightarrow \neg s)\land (x2\leftrightarrow p\lor q) \land (x3\leftrightarrow x2\land r)\land (x4\leftrightarrow x3\to x1) \land x4}$

What I don't understand is, that when converting this to CNF, do I keep the new variables? Would the resulting formula in CNF be this?
${T(\phi ):=(p \lor q \lor \neg x1) \land (p \lor \neg q \lor x1) \land (\neg p \lor q \lor \neg x1) \land (\neg p \lor \neg q \lor \neg x1) \land (x1 \lor r \lor \neg x2) \land (x1 \lor \neg r \lor x2) \land (\neg x1 \lor r \lor \neg x2) \land (\neg x1 \lor \neg r \lor \neg x2) \land (s \lor \neg x3) \land (\neg s \lor x3) \land (x2 \lor x3 \lor \neg x4) \land (x2 \lor \neg x3 \lor x4) \land (\neg x2 \lor x3 \lor \neg x4) \land (\neg x2 \lor \neg x3 \lor \neg x4) \land x4}$

Is this final CNF formula? Don't I need to resolve introduced variables x(1..4) back to variables from original formula?

Best Answer

Your expression can be depicted as a logical circuit similar to this one:

enter image description here

The four new auxiliary variables help to directly derive the clauses of the CNF:

$x1 = \lnot s$

$$(s \lor x1) \land (\lnot s \lor \lnot x1)$$

$x2 = p \lor q$

$$(\lnot p \lor x2) \land (\lnot q \lor x2) \land (p \lor q \lor \lnot x2)$$

$x3 = r \land x2$

$$(r \lor \lnot x3) \land (x2 \lor \lnot x3) \land (\lnot r \lor \lnot x2 \lor x3)$$

$x4 = x1 \land \lnot x3$

$$(x1 \lor \lnot x4) \land (\lnot x3 \lor \lnot x4) \land (\lnot x1 \lor x3 \lor x4)$$

As witten in the question, the final clause to tie $\phi$ to true, just consists of the single literal $x4$.

If you translate the CNF clauses in DIMACS format and feed this collection of clauses to a SAT solver, it will return either UNSAT (= no solution exists), UNKNOWN (= no solution found) or a number of solutions. Each solution assigns either true or false to any of the variables. As you are just interested in the primary input variables, you can ignore the assignments to the four switching variables.

Note:
For your example expression, Tseytin transformation does not really pay off. The expression can be expressed as CNF with just two clauses:

$$(\lnot p \lor \lnot r \lor \lnot s \lor \lnot \phi) \land (\lnot q \lor \lnot r \lor \lnot s \lor \lnot \phi)$$