title | filename | chapternum |
---|---|---|
Introduction |
lec_01_introduction |
1 |
Additional reading: Sections 2.1 (Introduction) and 2.2 (Shannon ciphers and perfect security) in the Boneh Shoup book. Chapters 1 and 2 of Katz-Lindell book.1
Ever since people started to communicate, there were some messages that they wanted kept secret. Thus cryptography has an old though arguably undistinguished history. For a long time cryptography shared similar features with Alchemy as a domain in which many otherwise smart people would be drawn into making fatal mistakes. Indeed, the history of cryptography is littered with the figurative corpses of cryptosystems believed secure and then broken, and sometimes with the actual corpses of those who have mistakenly placed their faith in these cryptosystems. The definitive text on the history of cryptography is David Kahn's "The Codebreakers", whose title already hints at the ultimate fate of most cryptosystems.2 (See also "The Code Book" by Simon Singh.)
We recount below just a few stories to get a feel for this field. But before we do so, we should introduce the cast of characters. The basic setting of "encryption" or "secret writing" is the following: one person, whom we will call Alice, wishes to send another person, whom we will call Bob, a secret message. Since Alice and Bob are not in the same room (perhaps because Alice is imprisoned in a castle by her cousin the queen of England), they cannot communicate directly and need to send their message in writing. Alas, there is a third person, whom we will call Eve, that can see their message. Therefore Alice needs to find a way to encode or encrypt the message so that only Bob (and not Eve) will be able to understand it.
1587, Mary the queen of Scots, and the heir to the throne of England, wanted to arrange the assassination of her cousin, queen Elisabeth I of England, so that she could ascend to the throne and finally escape the house arrest under which she had been for the last 18 years. As part of this complicated plot, she sent a coded letter to Sir Anthony Babington.
{#maryscottletterfig .margin }
Mary used what's known as a substitution cipher where each letter is transformed into a different obscure symbol (see maryscottletterfig{.ref}). At a first look, such a letter might seem rather inscrutable- a meaningless sequence of strange symbols. However, after some thought, one might recognize that these symbols repeat several times and moreover that different symbols repeat with different frequencies. Now it doesn't take a large leap of faith to assume that perhaps each symbol corresponds to a different letter and the more frequent symbols correspond to letters that occur in the alphabet with higher frequency. From this observation, there is a short gap to completely breaking the cipher, which was in fact done by queen Elisabeth's spies who used the decoded letters to learn of all the co-conspirators and to convict queen Mary of treason, a crime for which she was executed. Trusting in superficial security measures (such as using "inscrutable" symbols) is a trap that users of cryptography have been falling into again and again over the years. (As in many things, this is the subject of a great XKCD cartoon, see XKCDnavajofig{.ref}.)
The Vigenère cipher is named after Blaise de Vigenère who described it in a book in 1586 (though it was invented earlier by Bellaso).
The idea is to use a collection of substitution cyphers - if there are
The Enigma cipher was a mechanical cipher (looking like a typewriter, see enigmafig{.ref}) where each letter typed would get mapped into a different letter depending on the (rather complicated) key and current state of the machine which had several rotors that rotated at different paces. An identically wired machine at the other end could be used to decrypt. Just as many ciphers in history, this has also been believed by the Germans to be "impossible to break" and even quite late in the war they refused to believe it was broken despite mounting evidence to that effect. (In fact, some German generals refused to believe it was broken even after the war.) Breaking Enigma was an heroic effort which was initiated by the Poles and then completed by the British at Bletchley Park, with Alan Turing (of the Turing machines) playing a key role. As part of this effort the Brits built arguably the world's first large scale mechanical computation devices (though they looked more similar to washing machines than to iPhones). They were also helped along the way by some quirks and errors of the German operators. For example, the fact that their messages ended with "Heil Hitler" turned out to be quite useful.
Here is one entertaining anecdote: the Enigma machine would never map a letter to itself. In March 1941, Mavis Batey, a cryptanalyst at Bletchley Park received a very long message that she tried to decrypt. She then noticed a curious property--- the message did not contain the letter "L".3 She realized that the probability that no "L"'s appeared in the message is too small for this to happen by chance. Hence she surmised that the original message must have been composed only of L's. That is, it must have been the case that the operator, perhaps to test the machine, have simply sent out a message where he repeatedly pressed the letter "L". This observation helped her decode the next message, which helped inform of a planned Italian attack and secure a resounding British victory in what became known as "the Battle of Cape Matapan". Mavis also helped break another Enigma machine. Using the information she provided, the Brits were able to feed the Germans with the false information that the main allied invasion would take place in Pas de Calais rather than on Normandy.
In the words of General Eisenhower, the intelligence from Bletchley park was of "priceless value". It made a huge difference for the Allied war effort, thereby shortening World War II and saving millions of lives. See also this interview with Sir Harry Hinsley.
Many of the troubles that cryptosystem designers faced over history (and still face!) can be attributed to not properly defining or understanding what are the goals they want to achieve in the first place.
We now turn to actually defining what is an encryption scheme. Clearly we can encode every message as a string of bits, i.e., an element of
This motivates the following definition which attempts to capture what it means for an encryption scheme to be valid or "make sense", regardless of whether or not it is secure:
::: {.definition title="Valid encryption scheme" #encryptiondef}
Let
We will often write the first input (i.e., the key) to the encryption and decryption as a subscript and so can write eqvalidenc{.eqref} also as
The validity condition implies that for any fixed
::: {.remark title="A note on notation, and comparison with Katz-Lindell, Boneh-Shoup, and other texts." #notation}
A note on notation: We will always use
The number
We often use
We will use
For simplicity, we denote the space of possible keys as
encryptiondef{.ref} says nothing about security and does not rule out trivial "encryption" schemes such as the scheme
These considerations led Auguste Kerckhoffs in 1883 to state the following principle:
A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.^[The actual quote is "Il faut qu’il n’exige pas le secret, et qu’il puisse sans inconvénient tomber entre les mains de l’ennemi" loosely translated as "The system must not require secrecy and can be stolen by the enemy without causing trouble". According to Steve Bellovin the NSA version is "assume that the first copy of any device we make is shipped to the Kremlin".]
Why is it OK to assume the key is secret and not the algorithm? Because we can always choose a fresh key. But of course that won't help us much if our key is "1234" or "passw0rd!". In fact, if you use any deterministic algorithm to choose the key then eventually your adversary will figure this out. Therefore for security we must choose the key at random and can restate Kerckhoffs's principle as follows:
There is no secrecy without randomness
This is such a crucial point that is worth repeating:
There is no secrecy without randomness
At the heart of every cryptographic scheme there is a secret key, and the secret key is always chosen at random. A corollary of that is that to understand cryptography, you need to know some probability theory. Fortunately, we don't need much of probability- only probability over finite spaces, and basic notions such as expectation, variance, concentration and the union bound suffice for most of we need. In fact, understanding the following two statements will already get you much of what you need for cryptography:
-
For every fixed string
$x\in{0,1}^n$ , if you toss a coin$n$ times, the probability that the heads/tails pattern will be exactly$x$ is$2^{-n}$ . -
A probability of
$2^{-128}$ is really really small.
How do we actually get random bits in actual systems? The main idea is to use a two stage approach. First we need to get some data that is unpredictable from the point of view of an attacker on our system.
Some sources for this could be measuring latency on the network or hard drives (getting harder with solid state disk), user keyboard and mouse movement patterns (problematic when you need fresh randomness at boot time
), clock drift and more, there are some other sources including audio, video, and network. All of these can be problematic, especially for servers or virtual machines, and so hardware based random number generators based on phenomena
such as thermal noise or nuclear decay are becoming more popular. Once we have some data
One of the first attacks was on the SSL implementation of Netscape (the browser at the time). Netscape used the following "unpredicatable" information--- the time of day and a process ID both of which turned out to be quite predictable (who knew attackers have clocks too?). Netscape tried to protect its security through "security through obscurity" by not releasing the source code for their pseudorandom generator, but it was reverse engineered by Ian Goldberg and David Wagner (Ph.D students at the time) who demonstrated this attack.
In 2006 a programmer removed a line of code from the procedure to generate entropy in OpenSSL package distributed by Debian since it caused a warning in some automatic verification code. As a result for two years (until this was discovered) all the randomness generated by this procedure used only the process ID as an "unpredictable" source. This means that all communication done by users in that period is fairly easily breakable (and in particular, if some entities recorded that communication they could break it also retroactively). This caused a huge headache and a worldwide regeneration of keys, though it is believed that many of the weak keys are still used. See XKCD's take on that incident.
In 2012 two separate teams of researchers scanned a large number of RSA keys on the web and found out that about 4 percent of them are easy to break. The main issue were devices such as routers, internet-connected printers and such. These devices sometimes run variants of Linux- a desktop operating system- but without a hard drive, mouse or keyboard, they don't have access to many of the entropy sources that desktop have. Coupled with some good old fashioned ignorance of cryptography and software bugs, this led to many keys that are downright trivial to break, see this blog post and this web page for more details.
After the entropy is collected and then "purified" or "extracted" to a uniformly random string that is, say, a few hundred bits long, we often need to "expand" it into a longer string that is also uniform (or at least looks like that for all practical purposes). We will discuss how to go about that in the next lecture. This step has its weaknesses too and in particular the Snowden documents, combined with observations of Shumow and Ferguson, strongly suggest that the NSA has deliberately inserted a trapdoor in one of the pseudorandom generators published by the National Institute of Standards and Technologies (NIST). Fortunately, this generator wasn't widely adapted but apparently the NSA did pay 10 million dollars to RSA security so the latter would make this generator their default option in their products.
Defining the secrecy requirement for an encryption is not simple. Over the course of history, many smart people got it wrong and convinced themselves that ciphers were impossible to break. The first person to truly ask the question in a rigorous way was Claude Shannon in 1945 (though a partial version of his manuscript was only declassified in 1949). Simply by asking this question, he made an enormous contribution to the science of cryptography and practical security. We now will try to examine how one might answer it.
Let me warn you ahead of time that we are going to insist on a mathematically precise definition of security. That means that the definition must capture security in all cases, and the existence of a single counterexample, no matter how "silly", would make us rule out a candidate definition. This exercise of coming up with "silly" counterexamples might seem, well, silly. But in fact it is this method that has led Shannon to formulate his theory of secrecy, which (after much followup work) eventually revolutionized cryptography, and brought this science to a new age where Edgar Allan Poe's maxim no longer holds, and we are able to design ciphers which human (or even nonhuman) ingenuity cannot break.
The most natural way to attack an encryption is for Eve to guess all possible keys.
In many encryption schemes this number is enormous and this attack is completely infeasible.
For example, the theoretical number of possibilities in the Enigma cipher was about
Since it is possible to recover the key with some tiny probability (e.g. by guessing it at random), perhaps one way to define security of an encryption scheme is that an attacker can never recover the key with probability significantly higher than that. Here is an attempt at such a definition:
An encyption scheme
::: { .pause } When you see a mathematical definition that attempts to model some real-life phenomenon such as security, you should pause and ask yourself:
-
Do I understand mathematically what is the definition stating? \
-
Is it a reasonable way to capture the real life phenomenon we are discussing?
One way to answer question 2 is to try to think of both examples of objects that satisfy the definition and examples of objects that violate it, and see if this conforms to your intuition about whether these objects display the phenomenon we are trying to capture. Try to do this for securefirstattemptdef{.ref} :::
You might wonder if securefirstattemptdef{.ref} is not too strong.
After all how are we going ever to prove that Eve cannot recover the secret key no matter what she does? Edgar Allan Poe would say that there can always be a method that we overlooked. However, in fact this definition is too weak! Consider the following encryption: the secret key
Let
This follows because
The math behind the above argument is very simple, yet I urge you to read and re-read the last two paragraphs until you are sure that you completely understand why this encryption is in fact secure according to the above definition. This is a "toy example" of the kind of reasoning that we will be employing constantly throughout this course, and you want to make sure that you follow it.
So, trivialsec{.ref} is true, but one might question its meaning. Clearly this silly example was not what we meant when stating this definition. However, as mentioned above, we are not willing to ignore even silly examples and must amend the definition to rule them out. One obvious objection is that we don't care about hiding the key- it is the message that we are trying to keep secret. This suggests the next attempt:
An encryption scheme
Now this seems like it captures our intended meaning. But remember that we are being anal, and truly insist that the definition holds
as stated, namely that for every plaintext message
So, if before the definition was too weak, the new definition is too strong and is impossible to achieve. The problem is that of course we could guess a fixed message with probability one, so perhaps we could try to consider a definition with a random message. That is:
An encyption scheme
This weakened definition can in fact be achieved, but we have again weakened it too much.
Consider an encryption that hides the last
So far all of our attempts at definitions oscillated between being too strong
(and hence impossible) or too weak (and hence not guaranteeing actual
security).
The key insight of Shannon was that in a secure encryption scheme
the ciphtertext should not reveal any additional information about the
plaintext. So, if for example it was a priori possible for Eve to guess the
plaintext with some probability
::: {.definition title="Perfect secrecy" #perfectsecrecydef}
An encryption scheme
In particular, if we encrypt either "Yes" or "No" with probability
::: {.theorem title="Two to many theorem" #twotomanythm}
An encryption scheme
::: {.proof data-ref="twotomanythm"}
The "only if" direction is obvious--- this condition is a special case of the perfect secrecy condition for a set
The "if" direction is trickier. We need to show that if there is
some set
Let's fix the message
We can also write eqhitcipher{.eqref} as
$$
\E_{x_1 \leftarrow_R {0,1}^n} \Pr[ Eve(E_k(x_0))=x_1] \leq 1/|M|
$$
and so in particular, due to linearity of expectation, there exists
some
The proof of twotomanythm{.ref} is not trivial, and is worth reading again and making sure you understand it.
An excellent exercise, which I urge you to pause and do now is to prove the following:
::: {.solvedexercise title="Perfect secrecy, equivalent definition" #perfectsecrecyequiv}
Prove that a valid encryption scheme
-
$Y$ is obtained by sampling$k\sim {0,1}^n$ and outputting$E_k(x)$ . -
$Y'$ is obtained by sampling$k\sim {0,1}^n$ and outputting$E_k(x')$ . :::
::: {.solution data-ref="perfectsecrecyequiv"}
We only sketch the proof. The condition in the exercise is equivalent to perfect secrecy with
We summarize the equivalent definitions of perfect secrecy in the following theorem, whose (omitted) proof follows from twotomanythm{.ref} and perfectsecrecyequiv{.ref} as well as similar proof ideas.
::: {.theorem title="Perfect secrecy equivalent conditions" #perfectsecrecythm}
Let
-
$(E,D)$ is perfectly secret as per perfectsecrecydef{.ref}. -
For every pair of messages
$x_0,x_1 \in {0,1}^{L(n)}$ , the distributions ${ E_k(x_0) }{k \sim {0,1}^n}$ and ${ E_k(x_1) }{k \sim {0,1}^n}$ are identical. -
(Two-message security: Eve can't guess which of one of two messages was encrypted with success better than half.) For every function
$Eve:{0,1}^{C(n)} \rightarrow {0,1}^{L(n)}$ and pair of messages$x_0,x_1 \in {0,1}^{L(n)}$ ,
- (Arbitrary prior security: Eve can't guess which message was encrypted with success better than her prior information.) For every distribution
$\mathcal{D}$ over${0,1}^{L(n)}$ , and$Eve:{0,1}^{C(n)} \rightarrow {0,1}^{L(n)}$ ,
where we denote $max(\mathcal{D}) = \max_{x^\in {0,1}^{L(n)}} \Pr_{x \sim \mathcal{D}}[x=x^]$ to be the largest probability of any element under
So, perfect secrecy is a natural condition, and does not seem to be too weak for applications, but can it actually
be achieved? After all, the condition that two different plaintexts are mapped to the same distribution seems somewhat at odds
with the condition that Bob would succeed in decrypting the ciphertexts and find out if the plaintext was in fact
In fact, this can be generalized to any number of bits:^[The one-time pad is typically credited to Gilbert Vernam of Bell and Joseph Mauborgne of the U.S. Army Signal Corps, but Steve Bellovin discovered an earlier inventor Frank Miller who published a description of the one-time pad in 1882. However, it is unclear if Miller realized the fact that security of this system can be mathematically proven, and so theorem below should probably be still be credited to Vernam and Mauborgne.]
There is a perfectly secret valid encryption scheme
Our scheme is the one-time pad also known as the "Vernam Cipher", see onetimepadfig{.ref}.
The encryption is exceedingly simple: to encrypt a message
::: {.proof data-ref="onetimepad"}
For two binary strings
To analyze the perfect secrecy property, we claim that for every
The argument above is quite simple but is worth reading again. To understand why the one-time pad is perfectly secret, it is useful to envision it as a bipartite graph as we've done in onetimepadtwofig{.ref}. (In fact the encryption scheme of onetimepadtwofig{.ref} is precisely the one-time pad for
So, does onetimepad{.ref} give the final word on cryptography, and means that we can all communicate with perfect secrecy and live happily ever after?
No it doesn't.
While the one-time pad is efficient, and gives perfect secrecy, it has one glaring disadvantage: to communicate
This is not just a theoretical issue. The Soviets have used the one-time pad for their confidential communication since before the 1940's. In fact, even before Shannon's work, the U.S. intelligence already knew in 1941 that the one-time pad is in principle "unbreakable" (see page 32 in the Venona document). However, it turned out that the hassle of manufacturing so many keys for all the communication took its toll on the Soviets and they ended up reusing the same keys for more than one message. They did try to use them for completely different receivers in the (false) hope that this wouldn't be detected. The Venona Project of the U.S. Army was founded in February 1943 by Gene Grabeel (see genegrabeelfig{.ref}), a former home economics teacher from Madison Heights, Virgnia and Lt. Leonard Zubko. In October 1943, they had their breakthrough when it was discovered that the Russians were reusing their keys. In the 37 years of its existence, the project has resulted in a treasure chest of intelligence, exposing hundreds of KGB agents and Russian spies in the U.S. and other countries, including Julius Rosenberg, Harry Gold, Klaus Fuchs, Alger Hiss, Harry Dexter White and many others.
Unfortunately it turns out that that such long keys are necessary for perfect secrecy:
For every perfectly secret encryption scheme
The idea behind the proof is illustrated in longkeygraphfig{.ref}. We define a graph between the plaintexts and ciphertexts, where we put an edge between plaintext
::: {.proof data-ref="longkeysthm"}
Let
We choose
We will show the following claim:
Claim I: There exists some
Claim I implies that the string
::: {.remark title="Adding probability into the picture" #addingprobrem}
There is a sense in which both our secrecy and our impossibility results might not be fully convincing, and that is that we did not explicitly consider
algorithms that use randomness . For example, maybe Eve can break a perfectly secret encryption if she is not modeled as a deterministic function
For the former, note that a probabilistic process can be thought of as a distribution over functions, in the sense that we have a collection of functions
A similar (though more involved) argument shows that the impossiblity result showing that the key must be at least as long as the message still holds even if the encryption and decryption algorithms are allowed to be probabilistic processes as well (working this out is a great exercise). :::
longkeysthm{.ref} implies that for every encryption scheme
::: {.theorem title="Short keys imply high probability attack" #longkeyhighprob}
Let
::: {.proof data-ref="longkeyhighprob"}
As in the proof of longkeysthm{.ref}, let
We show this by arguing that this bound holds for every fixed
Now, for every
-
Input: A ciphertext
$y\in {0,1}^*$ -
Operation: If
$y\in S_0$ , output$x_0$ , otherwise output$x_1$ .
The probability that
$$ \tfrac{1}{2} \cdot 1 + \tfrac{1}{2} \cdot \left( 1-2^{-t} \right) = 1 - 2^{-t-1} ;. $$ :::
Much of this text is shared with my Introduction to Theoretical Computer Science textbook.
Shannon's manuscript was written in 1945 but was classified, and a partial version was only published in 1949. Still it has revolutionized cryptography, and is the forerunner to much of what followed.
The Venona project's history is described in this document. Aside from Grabeel and Zubko, credit to the discovery that the Soviets were reusing keys is shared by Lt. Richard Hallock, Carrie Berry, Frank Lewis, and Lt. Karl Elmquist, and there are others that have made important contribution to this project. See pages 27 and 28 in the document.
In a 1955 letter to the NSA that only recently came forward, John Nash proposed an "unbreakable" encryption scheme. He wrote "I hope my handwriting, etc. do not give the impression I am just a crank or circle-squarer.... The significance of this conjecture [that certain encryption schemes are exponentially secure against key recovery attacks] .. is that it is quite feasible to design ciphers that are effectively unbreakable. ". John Nash made seminal contributions in mathematics and game theory, and was awarded both the Abel Prize in mathematics and the Nobel Memorial Prize in Economic Sciences. However, he has struggled with mental illness throughout his life. His biography, A Beautiful Mind was made into a popular movie. It is natural to compare Nash's 1955 letter to the NSA to the 1956 letter by Kurt Gödel to John von Neumann. From the theoretical computer science point of view, the crucial difference is that while Nash informally talks about exponential vs polynomial computation time, he does not mention the word "Turing Machine" or other models of computation, and it is not clear if he is aware or not that his conjecture can be made mathematically precise (assuming a formalization of "sufficiently complex types of enciphering").
Footnotes
-
In the current state of these lecture notes, almost all references and credits are omitted unless the name has become standard in the literature, or I believe that the story of some discovery can serve a pedagogical point. See the Katz-Lindell book for historical notes and references. This lecture shares a lot of text with (though is not identical to) my lecture on cryptography in the introduction to theoretical computer science lecture notes. ↩
-
Traditionally, cryptography was the name for the activity of making codes, while cryptoanalysis is the name for the activity of breaking them, and cryptology is the name for the union of the two. These days cryptography is often used as the name for the broad science of constructing and analyzing the security of not just encryptions but many schemes and protocols for protecting the confidentiality and integrity of communication and computation. ↩
-
Here is a nice exercise: compute (up to an order of magnitude) the probability that a 50-letter long message composed of random letters will end up not containing the letter "L". ↩
-
There are about $10^{68}$ atoms in the galaxy, so even if we assumed that each one of those atoms was a computer that can process say $10^{21}$ decryption attempts per second (as the speed of light is $10^9$ meters per second and the diameter of an atom is about $10^{-12}$ meters), then it would still take $10^{113-89} = 10^{24}$ seconds, which is about $10^{17}$ years to exhaust all possibilities, while the sun is estimated to burn out in about 5 billion years. ↩