Frequency Analysis?
Home › Forums › National Cipher Challenge 2018: The Kompromat Files › Frequency Analysis?
Tagged: Confused
- This topic has 6 replies, 4 voices, and was last updated 6 years, 2 months ago by Anonymous.
-
AuthorPosts
-
17 Oct 18 at 8:26 pm #38979AnonymousInactive
Hello everybody,
Just created this to see who else apart from us who don’t (well, before!) understand how frequency analysis works. Please tell me if you are as confused as our group was. We asked a teacher and even they were confused and couldn’t help us![EDIT, Harry: Dear CipherGirl1, frequency analysis is not that scary! The idea is that some letters in the alphabet appear a lot more often than others. In particular, because E is a vowel and we use the word “the” a lot, you tend to find that the most common letter in a longish bit of English text is the letter E. Not always. Sometimes T beats it, but those two are usually the most common. If you have a cipher text encrypted so that the letter e is replaced by X, then there is a pretty good chance X will be the most common letter in the cipher text. If you count how often each letter appears and see that X is the most common then that is a strong hint that it might be the replacement for e, so you can assume that and see what else you can deduce. If you see a lot of three letter words ending in X you can guess that they are all copies of the word “the” and that helps you to see how t and h are encrypted. This is particularly useful if you are trying to decrypt a substitution cipher with the word shapes left in! If you look this up on the web you will find loads about it, and we talk about it in the beginners guide too.
Good luck,
Harry]
29 Oct 18 at 12:20 pm #39249AnonymousInactiveThanks Harry
29 Oct 18 at 12:21 pm #39250AnonymousInactiveHi Harry
Sorry it’s just that I wanted to help – not saying it’s totally a nightmare. Sorry I forgot to mention that before at the end of my first post.
The Encryptic Enterprise02 Nov 18 at 3:20 pm #39375AnonymousInactiveAlso helpful to consider frequency analysis alongside its cousins, bigram, trigram and quadgram analyses. These make more and more useful methods of analysing the score of english text, good for very short sections or sections that aren’t quite perfect.
Bigram, trigram, and quadgram frequency text files can be found here:
http://practicalcryptography.com/media/cryptanalysis/files/english_bigrams.txt
http://practicalcryptography.com/media/cryptanalysis/files/english_trigrams.txt.zip
http://practicalcryptography.com/media/cryptanalysis/files/english_quadgrams.txt.zip
(http://practicalcryptography.com/media/cryptanalysis/files/english_quintgrams.txt.zip)
(the last one is barely more useful than quadgrams and is certainly slower, so I wouldn’t recommend it)05 Nov 18 at 10:07 am #39385AnonymousInactiveHi thanks! We will certainly consider that!
The Encryptic Enterprise05 Nov 18 at 10:07 am #39392AnonymousInactiveThe simplest way to use single-letter (monogram) frequencies is just to line up the most frequent letter of the ciphertext with the most frequent letter of english, etc. But as Harry pointed out, this is a bad idea because E and T sometimes vie for first place, and the others shuffle around too.
The second easiest thing to do with single-letter frequencies is take the dot product of the frequencies in your candidate plaintext with the frequencies of english. Your math teacher should know what a dot product is, if you do not. Basically, decrypt with a candidate key to get a candidate plaintext, find the frequencies of the letters in it, then find the sum of freq(A in candidate)*freq(A in english) + freq(B in candidate)*freq(B in english) + … The candidate with the highest sum is the winner. This works very well with Caesar shift and affine ciphers, and can crack one in a split second by looping over all possible keys and keeping the solution with the highest sum.
For polyalphabetic ciphers, or for a generic substitution cipher that is not as constrained as Caesar or affine, you need to use di- tri- or tetra-gram frequencies to determine if a candidate is good. But the idea is similar.
05 Nov 18 at 10:41 am #39422AnonymousInactiveYou will also need to look at the lengths between the -grams in polyalphabetic ciphers to find the key length.
-
AuthorPosts
- You must be logged in to reply to this topic.