Friday, September 5, 2014

Encryption 101: Substitution ciphers

Jack Hammond
Junior Developer
Egress Software Technologies Ltd.
So far in this blog series, we’ve mainly focused on transposition ciphers, which encrypt their messages by shifting the letters around, as in the Caesar and Atbash ciphers, or by ‘jumbling them up’ in some way that makes discerning their true meaning difficult, à la the Columnar Transposition Cipher.

The simple substitution cipher

The basic idea of a substitution cipher is a simple one: take one letter in your message, let’s say ‘A’, and replace it with a different letter, such as ‘E’.
Sounds familiar?
Both the Atbash and Caesar ciphers used this basic principle, however they both have one weakness: predictability. Figure out how a handful of letters had been encrypted and you can pretty much break the entire message. (Learn more about how these ciphers work in my previous post: Encryption 101: Back to basics.)
The substitution cipher, however, takes this idea to the next level and provides a ‘random’ alphabet to encrypt the message. In other words, each letter is encrypted with its own key.
The table below displays an alphabet that I chose at random, simply placing letters in different locations until it was complete.

Plaintext alphabet
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
Ciphertext alphabet
D
H
C
L
P
F
S
V
J
Y
U
O
B
R
N
T
Z
K
I
X
W
E
Q
M
G
A

This new alphabet makes figuring out the relationship between the plaintext and the ciphertext a lot harder, as the confusion that the cipher provides has been increased. The diffusion, however, is still fairly low – changing one letter in the plaintext will still only change one letter in the ciptertext – but this won’t really increase in complexity until we start looking at more modern examples.
However, while the Atbash cipher had just one key and the Caesar cipher had 25, the substitution cipher has 26 (factorial) unique keys. This works out to about 403,291,461,126,605,635,584,000,000 different ways to write the alphabet!
As you can see, the number of keys increases rapidly the more the ciphers advance.

More keys = More secure?

While one might think that having a vast number of keys to choose from is a good security metric – after all, what attacker is going to sit there and write out every possible permutation of the alphabet, run your ciphertext through it and see whether they can break the encryption – substitution ciphers still suffer the same inherit weakness as the transposition ciphers before them: letter frequency analysis.(I discussed this topic in further detail when looking at weaknesses in the Caesar cipher.)

Defeating letter frequency analysis

Letter frequency analysis has so far proven to be a very powerful cryptanalysis method, so you would be forgiven for thinking that eventually all ciphers would be cracked by it.
As part of this Encryption 101 series, however, we will move onto the Vigenere Cipher, Substitution-Permutation Networks, which start to try to increase the diffusion property of the encryption process to make the relationship between plaintext and ciphertext. We’ll also take a look at the One Time Pad cipher, which some argue is the only form of ‘perfect’ cryptography we’ve ever created – however nothing is perfect in the world of cryptography and even this ‘perfect’ cipher has its drawbacks.

Your turn to crack the code (try these at your desk!)

For these examples, we’ll be using the substituted alphabet that we create earlier in the blog to encrypt and decrypt some messages. You’ll also be asked to carry out a letter frequency analysis on a piece of ciphertext to see whether you can uncover the ‘cipherbet’ used to encrypt it.

  1. Using the cipherbet, encrypt the following phrase:
    • ‘You either die a hero, or live long enough to see yourself become the villain’
    • Answer: gnw pjxvpk ljp d vpkn, nk ojep onrs prnwsv xn ipp gnwkipof hpcnbp xvp ejoodjr
  2. Using the cipherbet, recover the plaintext message from the ciphertext. The spaces between the words has been removed to make it a little harder:
    • ‘hpcdwip vp'i xvp vpkn snxvdb lpipkepi, hwx rnx xvp nrp jx rppli kjsvx rnq’
    •  Answer: Because he's the hero Gotham deserves, but not the one it needs right now
  3. Now for something a little harder. Using the website and ciphertext below, see whether by using letter frequency analysis you can recover not only the plaintext message but also the cipherbet used to encrypt it: