GeoCache Agent - It appears you require a refresher course in Cryptography. The following lessons should prove invaluable in your quest for the cache.

Remember, there are variations of the below mentioned cipher's. You may encounter one of these variation's during your mission.



The Caesar Cipher

One of the simplest examples of a substitution cipher is the Caesar cipher, which is said to have been used by Julius Caesar to communicate with his army. Caesar is considered to be one of the first persons to have ever employed encryption for the sake of securing messages. Caesar decided that shifting each letter in the message would be his standard algorithm, and so he informed all of his generals of his decision, and was then able to send them secured messages. Using the Caesar Shift (3 to the right), the message,

"RETURN TO ROME"

would be encrypted as,

"UHWXUA WR URPH"

In this example, 'R' is shifted to 'U', 'E' is shifted to 'H', and so on. Now, even if the enemy did intercept the message, it would be useless, since only Caesar's generals could read it.

Thus, the Caesar cipher is a shift cipher since the ciphertext alphabet is derived from the plaintext alphabet by shifting each letter a certain number of spaces. For example, if we use a shift of 19, then we get the following pair of ciphertext and plaintext alphabets:

 

Plaintext:  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z





Ciphertext: T U V W X Y Z A B C D E F G H I J K L M N O P Q R S





To encipher a message, we perform a simple substitution by looking up each of the message's letters in the top row and writing down the corresponding letter from the bottom row. For example, the message

THE FAULT, DEAR BRUTUS, LIES NOT IN OUR STARS BUT IN OURSELVES.

would be enciphered as

MAX YTNEM, WXTK UKNMNL, EBXL GHM BG HNK LMTKL UNM BG HNKLXEOXL.

Essentially, each letter of the alphabet has been shifted nineteen places ahead in the alphabet, wrapping around the end if necessary. Notice that punctuation and blanks are not enciphered but are copied over as themselves.

Breaking a Caesar Cipher by Hand

Can a computer guess what shift was used in creating a Caesar cipher? The answer, of course, is yes. But how does it work?

The unknown shift is one of 26 possible shifts. One technique might be to try each of the 26 possible shifts and check which of these resulted in readable English text. But this approach has limitations. For one thing how would the computer recognize "readable English text?" For another, what if a multiple Caesar shift was used, as is the case for a Vigenere cipher , where each letter of the keyword provides the basis for a Caesar shift. That is, if the key word is bam, then every third letter of the plaintext starting at the first would be shifted by 'b' (=1) and every third letter beginning at the second would be shifted by 'a' (=0) and every third letter beginning at the third would be shifted by 'm' (=12). Obviously we can't depend on obtaining readable English text here.

A better approach makes use of statistical data about English letter frequencies. It is known that in a text of 1000 letters of various English alphabet occur with about the following relative frequencies:

 
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
73 9 30 44 130 28 16 35 74 2 3 35 25 78 74 27 3 77 63 93 27 13 16 5 19 1
This information can be useful in deciding the most likely shift used on a given enciphered message. Suppose the enciphered message is:

K DKVO DYVN LI KX SNSYD, PEVV YP CYEXN KXN PEBI, CSQXSPISXQ XYDRSXQ.

We can tally the frequencies of the letters in this enciphered message, thus

 
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 4 3 0 0 0 3 0 4 1 0 4 1 4 3 1 6 0 0 4 0 7 4 0

Now we can now shift the two tallies so that the large and small frequencies from each frequency distribution match up roughly. For example, if we try a shift of ten on the previous example, we get the following correspondence between English language frequencies and the letter frequencies in the message.

English Language Frequencies
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
73 9 30 44 130 28 16 35 74 2 3 35 25 78 74 27 3 77 63 93 27 13 16 5 19 1

Enciphered Message Frequencies
K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
4 1 0 4 1 4 3 1 6 0 0 4 0 7 4 0 0 1 2 4 3 0 0 0 3 0

Note that in this case the large frequencies for cipher X and Y correspond to large for English N and O, the bare spots for cipher T and U correspond to bare spots for English J and K. Also, an isolated large frequency for cipher S corresponds to a similar one for English I. In view of this evidence we needn't even worry too much about the drastic mismatch for English E, which is usually the most frequent letter in a random sample of English text.

If we now apply this substitution to the message we get:

A TALE TOLD BY AN IDIOT, FULL OF SOUND AND FURY, SIGNIFYING NOTHING.

 

Using the Chi-square statistic

The chi-square statistic allows compare how closely a shift of the English frequency distribution matches the frequency distribution of the secret message. Here's an algorithm for computing the chi-square statistic:
  • Let ef(c) stand for the English frequency of some letter of the alphabet
  • Let mf(c) stand for the frequency of some letter of the message
  • For each possible shift s between 0 and 25:
  • For each letter c of the alphabet
  • Compute the sum of squares of mf((c + s) mod 26) divided by ef(c)
That is, for a given character, say 'a', we compute the square of the frequency of that character shifted by one of the possible Caesar shifts and then divide it by the English frequency of that character. For a given shift, say 5, we do this for each of the 26 letters in the alphabet. We thereby get 26 different chi-square values. The shift s for which the number ChiSquare( s ) is smallest is the most likely candidate for the shift that was used to encipher the message.

The Vigenere Cipher -- A Polyalphabetic Cipher

One of the main problems with simple substitution ciphers is that they are so vulnerable to frequency analysis. Given a sufficiently large ciphertext, it can easily be broken by mapping the frequency of its letters to the know frequencies of, say, English text. Therefore, to make ciphers more secure, cryptographers have long been interested in developing enciphering techniques that are immune to frequency analysis. One of the most common approaches is to suppress the normal frequency data by using more than one alphabet to encrypt the message. A polyalphabetic substitution cipher involves the use of two or more cipher alphabets. Instead of there being a one-to-one relationship between each letter and its substitute, there is a one-to-many relationship between each letter and its substitutes.

The Vigenere Tableau

The Vigenere Cipher , proposed by Blaise de Vigenere from the court of Henry III of France in the sixteenth century, is a polyalphabetic substitution based on the following tableau:
    	    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z



	A   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

	B   B C D E F G H I J K L M N O P Q R S T U V W X Y Z A 

	C   C D E F G H I J K L M N O P Q R S T U V W X Y Z A B

	D   D E F G H I J K L M N O P Q R S T U V W X Y Z A B C 

	E   E F G H I J K L M N O P Q R S T U V W X Y Z A B C D 

	F   F G H I J K L M N O P Q R S T U V W X Y Z A B C D E 

	G   G H I J K L M N O P Q R S T U V W X Y Z A B C D E F 

	H   H I J K L M N O P Q R S T U V W X Y Z A B C D E F G 

	I   I J K L M N O P Q R S T U V W X Y Z A B C D E F G H 

	J   J K L M N O P Q R S T U V W X Y Z A B C D E F G H I 

	K   K L M N O P Q R S T U V W X Y Z A B C D E F G H I J 

	L   L M N O P Q R S T U V W X Y Z A B C D E F G H I J K 

	M   M N O P Q R S T U V W X Y Z A B C D E F G H I J K L 

	N   N O P Q R S T U V W X Y Z A B C D E F G H I J K L M 

	O   O P Q R S T U V W X Y Z A B C D E F G H I J K L M N 

	P   P Q R S T U V W X Y Z A B C D E F G H I J K L M N O 

	Q   Q R S T U V W X Y Z A B C D E F G H I J K L M N O P 

	R   R S T U V W X Y Z A B C D E F G H I J K L M N O P Q 

	S   S T U V W X Y Z A B C D E F G H I J K L M N O P Q R  

	T   T U V W X Y Z A B C D E F G H I J K L M N O P Q R S 

	U   U V W X Y Z A B C D E F G H I J K L M N O P Q R S T 

	V   V W X Y Z A B C D E F G H I J K L M N O P Q R S T U

	W   W X Y Z A B C D E F G H I J K L M N O P Q R S T U V 

	X   X Y Z A B C D E F G H I J K L M N O P Q R S T U V W 

	Y   Y Z A B C D E F G H I J K L M N O P Q R S T U V W X 

	Z   Z A B C D E F G H I J K L M N O P Q R S T U V W X Y 

The first row is a shift of 0; the second is a shift of 1; and the last is a shift of 25.

The Vigenere cipher uses this table together with a keyword to encipher a message. For example, suppose we wish to encipher the plaintext message:

TO BE OR NOT TO BE THAT IS THE QUESTION
using the keyword RELATIONS. We begin by writing the keyword, repeated as many times as necessary, above the plaintext message. To derive the ciphertext using the tableau, for each letter in the plaintext, one finds the intersection of the row given by the corresponding keyword letter and the column given by the plaintext letter itself to pick out the ciphertext letter.
	Keyword:	RELAT IONSR ELATI ONSRE LATIO NSREL

	Plaintext:	TOBEO RNOTT OBETH ATIST HEQUE STION

	Ciphertext:	KSMEH ZBBLK SMEMP OGAJX SEJCS FLZSY

Decipherment of an encrypted message is equally straightforward. One writes the keyword repeatedly above the message:
	Keyword:	RELAT IONSR ELATI ONSRE LATIO NSREL

	Ciphertext:	KSMEH ZBBLK SMEMP OGAJX SEJCS FLZSY

	Plaintext:	TOBEO RNOTT OBETH ATIST HEQUE STION

This time one uses the keyword letter to pick a column of the table and then traces down the column to the row containing the ciphertext letter. The index of that row is the plaintext letter.

The strength of the Vigenere cipher against frequency analysis can be seen by examining the above ciphertext. Note that there are 7 'T's in the plaintext message and that they have been encrypted by 'H,' 'L,' 'K,' 'M,' 'G,' 'X,' and 'L' respectively. This successfully masks the frequency characteristics of the English 'T.' One way of looking at this is to notice that each letter of our keyword RELATIONS picks out 1 of the 26 possible substitution alphabets given in the Vigenere tableau. Thus, any message encrypted by a Vigenere cipher is a collection of as many simple substitution ciphers as there are letters in the keyword.

Although the Vigenere cipher has all the features of a useful field cipher -- i.e., easily transportable key and tableau, requires no special apparatus, easy to apply, etc. -- it did not catch on its day. A variation of it, known as the Gronsfeld cipher , did catch on in Germany and was widely used in Central Europe. The Gronsfeld variant used the digits of a keynumber instead of a the letters of keyword, but remained unchanged in all other respects. So in fact the Gronsfeld is a weaker technique than Vigenere since it only uses 10 substitute alphabets (one per digit 0..9) instead of the 26 used by Vigenere.

 

Solving the Vigenere Cipher : The Kasiski/Kerckhoff Method

Vigenere-like substitution ciphers were regarded by many as practically unbreakable for 300 years. In 1863, a Prussian major named Kasiski proposed a method for breaking a Vigenere cipher that consisted of finding the length of the keyword and then dividing the message into that many simple substitution cryptograms. Frequency analysis could then be used to solve the resulting simple substitutions.

Kasiski's technique for finding the length of the keyword was based on measuring the distance between repeated bigrams in the ciphertext. Note that in the above cryptogram the plaintext bigram 'TO' occurs twice in the message at position 0 and 9 and in both cases it lines up perfectly with the first two letters of the keyword. Because of this it produces the same ciphertext bigram, 'KS.' The same can be said of plaintext 'BE' which occurs twice starting at positions 2 and 11, and also is encrypted with the same ciphertext bigram, 'ME.' In fact, any message encrypted with a Vigenere cipher will produce many such repeated bigrams. Although not every repeated bigram will be the result of the encryption of the same plaintext bigram, many will, and this provides the basis for breaking the cipher. By measuring and factoring the distances between recurring bigrams -- in this case the distance is 9 -- Kasiski was able to guess the length of the keyword. For this example,

	Location	01234 56789 01234 56789 01234 56789

	Keyword:	RELAT IONSR ELATI ONSRE LATIO NSREL

	Plaintext: 	TOBEO RNOTT OBETH ATIST HEQUE STION

	Ciphertext:	KSMEH ZBBLK SMEMP OGAJX SEJCS FLZSY

the Kasiski method would create something like the following list:

 
Repeated Bigram Location Distance Factors
KS 9 9 3, 9
SM 10 9 3, 9
ME 11 9 3, 9
...

Factoring the distances between repeated bigrams is a way of identifying possible keyword lengths, with those factors that occur most frequently being the best candidates for the length of the keyword. Note that in this example since 3 is also a factor of 9 (and any of its multiples) both 3 and 9 would be reasonable candidates for keyword length. Although in this example we don't have a clear favorite, we've narrowed down the possibilities to a very small list. Note also that if a longer ciphertext were encrypted with the same keyword ('RELATIONS'), we would expect to find repeated bigrams at multiples of 9 -- i.e., 18, 27, 81, etc. These would also have 3 as a factor. Kasiski's important contribution is to note this phenomenon of repeated bigrams and propose a method -- factoring of distances -- to analyze it.

Once the length of the keyword is known, the ciphertext can be broken up into that many simple substitution cryptograms. That is, for a keyword of length 9, every 9-th letter in the ciphertext was encrypted with the same keyword letter. Given the structure of the Vigenere tableau, this is equivalent to using 9 distinct simple substitution ciphers, each of which was derived from 1 of the 26 possible Caesar shifts given in the tableau. The pure Kasiski method proceeds by analyzing these simple substitution cryptograms using frequency analysis and the other standard techniques. A variant of this method, proposed by the French cryptographer Kerckhoff, is based on discovering the keyword itself and then using it to decipher the cryptogram. In Kerckhoff's method, after the message has been separated into several columns, corresponding to the simple substitution cryptograms, one tallies the frequencies in each column and then uses frequency and logical analysis to construct the key. For example, suppose the most frequent letter in the first column is 'K'. We would hypothesize that 'K' corresponds to the English 'E'. If we consult the Vigenere tableau at this point, we can see that if English 'E' were enciphered into 'K' then row G of the table must have been the alphabet used for the first letter of the keyword. This implies that the first letter of the keyword is 'G'.

The problem with this "manual" approach is that for short messages there are often several good candidates for English 'E' in each column. This requires the testing of multiple hypotheses, which can get quite tedious and involved. Therefore we need a more sensitive test to discover the alphabet used by each letter of the keyword. Recalling that each row of the Vigenere tableau is one of the 26 Caesar shifts, we can use the chi-square test to determine which of the 26 possible shifts was used for each letter of the keyword. This modern day version of the Kerckhoff method turns out to be very effective.


The Gronsfeld Cipher -- A Variant of Vigenere

This page describes a method for attacking a Gronsfeld cipher. It is based on the approach described in F. Pratt, Secret and Urgent, NY: Bobbs-Merrill, 1939.

Here's a message written in a Gronsfeld Cipher.

cjifk qywtj ioipo wovlh ncxlo peosg gxrkx baiiq caguy rxrlq klcoy vewql nhsut oiddg qdrap dnfwk owpgw gzlsk xlt

The Gronsfeld cipher is a variation of the Vigenere cipher in which a key number is used instead of a keyword, e.g., 14965. Usually the key does not contain repeated digits.

For this problem, I've simplified things as follows: we allow only the digits between 0-5 (a-e) to be used in the key. The method for attacking a Gronsfeld cipher involves the following steps:

Step 1. Write the first line of the message, and then write under each of its letter, the letters that precede it in the alphabet. Since we know that this version of Gronsfeld uses only numbers between 0-5, (a-e), we need 6 rows. I've numbered the rows and columns so that we can refer to them.

	   0  1  2  3  4  5  6  7

	0  c  j  i  f  k  q  y  w  tj ioipo wovlh ncxlo peosg  gxrkx  (Message)

	1  b  i  h  e  j  p  x  v  si hnhon vnukg mbwkn odnrf  fwqjw

	2  a  h  g  d  i  o  w  u  rh gmgnm umtjf lavjm ncmqe  evpiv

	3  z  g  f  c  h  n  v  t  qg flfml tlsie kzuil mblpd  duohu

	4  y  f  e  b  g  m  u  s  pf ekelk skrhd jythk lakoc  ctngt

	5  x  e  d  a  f  l  t  r  oe djdkj rjqgc ixsgj kzjnb  bsmfs



  • Step 2. Construct all reasonable trigrams using combinations of letters from the first three columns -- i.e., columns 0-2 -- taking 1 letter from each column. For example, we can get the trigram 'ahe' by picking from rows 2,2,3. We would say that the number code for 'ahe' is 223. Since this represents the first word of the message, the trigrams formed should be possible ways to start a word or phrase. In this case, 'ahe' could be the start of 'ahead.' Actually, it's not a very likely trigram, since it repeats the number 2. Make a table of the trigrams, their number codes (which represent a portion of the possible key number) and their frequencies, from Table XII in Pratt.
    	Trigram		Code		Frequency (Table XII in Pratt)
    
    
    
    	aid		215		24 ******** 
    
    	age		234		20 ********
    
    	aff		243		9
    
    	ahe		224		2
    
    	agi		230		3
    
    	agg		232		3
    
    	big		114		4
    
    	chi		010		22 ******** repeated numbers
    
    	che		024		27 ********
    
    	cei		050 052		13 
    
    	bed		155		2
    
    	bee		154		32 ********
    
    	bei		150		19 ********
    
    	bef		153		8
    
    	beg		152		5
    
    
  • Step 3. Pick the most reasonable looking trigrams from the list in step 2. In this case we've picked the following entries:
    	aid		215		24 ******** 
    
    	age		234		20 ********
    
    	bee 		154		32 ********
    
    	bei 		150		19 ********
    
    	che 		024		27 ********
    
    

    They are all relatively frequent trigrams. They could be used as the prefix of the first word. None of them involves a repeated digit in its number code, which rules out 'chi.'

     

  • Step 4. For each of the likely trigrams, apply the number formulas to each succeeding trigram in the message. For example, if we apply 024, to the letters in columns 1,2,3 we get the trigram, 'jgb'; if we apply it to the letters in columns 2,3,4 we get 'idg,' and so on. A partial table has been constructed below. Impossible trigrams are marked with (*). Filling in the rows for 'aid' and 'age' is left as an exercise.
    Column  	1	2	3	4	5
    
    
    
    aid 215
    
    age 234
    
    bee 154 	idb    	hag 	efm* 	jlu* 	pts
    
    bei 150 	idf    	hak 	efq* 	jly* 	ptw
    
    che 024 	jgb* 	idg	fim    	kou    qws*
    
    

     

  • Step 5. Note that in the table above, some of the trigrams for 'bee' and 'bei' are reasonable looking, but they don't combine well with the assumption that 'bee' or 'bei' form the first three letters of the message. For example, we can get 'bee--pts' by combining 'bee' with the trigram that starts in column 5, the first column that has a possible trigram, since 'efm' and 'jlu' are impossible. Similarly, we can get 'bei--ptw' by combining 'bei' and 'ptw', which also starts in column 5. Neither of these strings ('bee--pts' or 'bei--ptw') look very promising as the start of the clear message. On the other hand, combining 'che' as the prefix with the trigram that begins at column 4 ('kou'), gives the following partial string: 'che-kou.' That looks pretty promising. So let's work on it.

     

  • Step 6. Now, working with our partial solution, that begins, che-kou, replace the blank with each of the 6 letters from column 3 of the table in step 1. This gives us all possible trigrams for columns 2-3-4 that are consistent with che and kou. This list consists of:

    efk, eek, edk, eck, ebk, eak

    We want to eliminate 'efk,' 'edk,' and 'ebk' from this list, leaving Ôeek,Õ ÔeckÕ and Ôeak.Õ If we make these substitutions we get the following candidates for partial solutions:

    	Candidate	Number Code	Comment	
    
    	
    
    	cheekou  	0241024 		Possibly cheek our or cheek out
    
    	checkou  	0243024 		Possible check out or check our
    
    	cheakou  	0245024		Not very likely
    
    

    Notice that a cycle is beginning to appear that goes 024-024 and we now have two candidates 02410241 and 02430243. If we replace the 7th letter for each of these candidates we get:

    	02410241 = cheekouw		Impossible
    
    	02430243 = checkout  		********* Solution!!!! ***********
    
    
  • Step 7. To complete the decipherment, apply the key number 0243 to the rest of the cipher text. You can use CryptoTool to complete this if you use the keyword 'aced.'

 

This site hosted by