[Math] How to break XOR cipher with repeating key

cryptography

I need to crack a stream cipher with a repeating key.

The length of the key is definitely 16. Each key can be any of the characters numbered 32-126 in ASCII.

The algorithm goes like this:

Let's say you have a plain text:

"Welcome to Q&A for people studying math at any level and professionals in related fields."

Let's say that the password is:

"0123456789abcdef"

Then, to encrypt the plaintext, just XOR them together. If the key isn't long enough, just repeat it. e.g.,

Welcome to Q&A for people studying math at any level and professionals in related fields.

                                 XOR

0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789ab

I have 2 English messages encrypted with the above algorithm and with the same key.
I know about the communicative property of xor and that it can be exploited for the example above.
I've read that this is a pretty weak cipher and it has been cracked. However, I have no idea how to do it. So, where can I find a cryptanalysis tool to do it for me?

Best Answer

Take the two encrypted messages and XOR them with each other. You'll get the XOR of the two original, unencrypted messages since the identical keys cancel out. Deciphering this just requires patience and a good understanding of the encoding (what exactly is being XORed - the ASCII values of the letters? Some other binary encoding? Are spaces retained?).

You can try some common words and letter combinations and look for places where XORing the word with the ciphertext yields an English-looking string (which would then be the corresponding word or letter combination at the other text). Also, the ciphertext will likely have numerous 0 values corresponding to places where both messages have the same letter. Such matches are more likely to be a pair of E's or a pair of Spaces than a pair of Q's or X's.

Having said all that, I don't know where you can find a tool to do it for you, and if you find one it'll be very sensitive to the particular encoding. It's much more fun to do this yourself.

EDIT: in the comments you're mentioning that it's indeed ASCII encoding. This makes the task much easier since many 8-bit sequences don't correspond to any legal ASCII character at all. Make a table of all the possible XORs of legal ASCII characters in your plain text and this will tell you, for each position in your text, what are the possible pairs of letters from the two messages.