Writing in code is an old art. Apparently human beings have had secrets and intrigue right from the start. There is a large scale fascination with cryptography and cryptanalysis, and the root of this fascination lies in the fact that codes are secrets and secrets were made to be discovered. The breaking of codes, without the help of a computer or a savant has always seemed impossible to me. Like Columbus and his egg, I just can’t comprehend how it can be done until someone shows me the way.
I have always loved codes. In my box of mementos I still have the collections of fabulous substitution ciphers my friends and I devised to send each other our classroom secrets. There is one cipher that is made up of a particularly elaborate system of symbols, and another that looks like the love-child of shorthand and hieroglyphics. The hours we put into devising our codes and encrypting our (not overly exciting) messages!
When we were feeling less artistic or inventive, my friends and I often reverted to the simple alphabet shift ciphers (or Caesar Shifts as I have since learnt they are called). Though these codes still appeared difficult, I could see how, with a bit of perseverance, they could be broken. We just trusted to the fact that, if found, none of our teachers or classmates would invest the required time to crack them. The substitution codes though, with their captivatingly confusing array of dots and lines and squiggles were, we thought, much more sophisticated and impossible to break.
However, in the course of writing my current novel, I’ve done a bit of research on codes, and it would appear that these substitution ciphers are amongst the easiest to crack. The undoing of these codes is down to the Muslim scholars of the Islamic culture golden age (which lasted from about 750CE to the thirteenth century). They realised that letters in any language appear with regular and reliable frequency and so they developed a technique to crack substitution ciphers called Frequency Analysis. This process was explained beautifully in my go-to code resource; Codebreaker: The History of Secret Communication by Stephen Pincock and Mark Frary. I’ve used it as a reference for an earlier blogpost, Chapel of Secrets and as a guide for the secret writings in my novel. Below I’ve summarised the method of breaking a substitution cipher.
Basically, as explained above, there is a certain frequency followed by the English alphabet as to how often each letter will appear in a body of text. E makes the most regular appearance, coming in at 12 percent of all the letters in any given piece of writing, followed by T at 9 percent and A at 8 percent. The expected relative frequencies of the whole English alphabet are shown in the table below.
Now, if you have an encrypted piece of text, one which has been encrypted using a substitution cypher, you can make your own record of how often each letter or symbol appears by using a similar chart to the one above. Say for example we have the text (encrypted using www.braingle.com) :
Znoy oy znk vuckx, otjkkj znk cnurk vuotz ul zngz zoxkj urj (haz tkbkxznkrkyy zxak) sgtzxg ‘ynuc jut’z zkrr’. Ol eua igt ynuc payz ktuamn yu zngz euax gajoktik corr lorr ot znk mgvy znksykrbky, znke corr zgqk ut g qotj ul uctkxynov ul euax yzuxe. Euax gajoktik hkiusky otbkyzkj ot g cge zngz qkkvy znks iruyk zu znk yzuxe, grsuyz g vgxz ul oz.
Then the corresponding graph would look like this:
A similar shape to our original expected frequencies graph, only shifted. So, comparing our two graphs, it would seem that the letter K has been substituted for the letter E. Which means that G has been substituted for A, and H for B and so on. And following this through, you get the decrypted cipher (from last week’s blog post) reading:
This is the power, indeed the whole point of that tired old (but nevertheless true) mantra ‘show don’t tell’. If you can show just enough so that your audience will fill in the gaps themselves, they will take on a kind of ownership of your story. Your audience becomes invested in a way that keeps them close to the story, almost a part of it.
So there you have it, quite simple once you know how, isn’t it? Of course, this technique only works for monoalphabetic cyphers, which are cyphers that use a single substitution for each letter of the alphabet. Polyalphabetic cyphers, in which each letter of the alphabet can be represented by a number of different letters, numerals or symbols, are a different story. I’ll tell you more about them some other day.
In case you’re itching to have a go at this, I’ll leave you with this coded piece of text from one of my favourite authors. Let me know the name of the author in the comments below if you manage to break it (but leave the text a secret so that everyone can enjoy the fun). NB: I’ve taken out all the punctuation for this piece to make it more realistic.
ro cdyzzon k zkccsxq qekbn led nsnxd nkbo woxdsyx zvkdpybw xsxo kxn drboo aekbdobc dro qekbn rkn xofob rokbn yp ryqgkbdc kxn grox rkbbi myevnxd ofox dovv rsw grsmr zkbd yp dro myexdbi sd gkc sx ro cdkbdon dy qod kxxyion kc dryeqr rkbbi gkc losxq cdezsn yx zebzyco
If you’d like a peek into the world of my writing, reading and mothering, you can find me most days on Twitter, Facebook or Instagram. Until then, I’ll see you next week, happy cryptanalysis!