SlideShare a Scribd company logo
1 of 5
Download to read offline
Modern Mathematics and the Evolution of Cryptography
Alin Galatan
1 History
Since Antiquity, various civilizations realized the need to encrypt their messages, especially because they had to
send messages to military leaders on the battlefront. It is no surprise that one of the first encryption algorithms is
called Caesar’s Algorithm. But this was as easy and rudimentary as the mathematics of its time. Say someone (which
we will refer to as Alice) wants to send the message ”ABZ” to a second person (which we will refer to as Bob). We
will use the English alphabet, for simplicity. Then Alice would choose a private-key, which is a number between 1 and
25. Say she chooses 2. Alice would then cycle through the alphabet such that:
A −→ C, B −→ D, Z −→ B
Hence ABZ becomes CDB, which Bob can very easily decrypt - he just cycles backwards.
As society evolved, mathematics and cryptographic algorithms evolved in parallel. But they were all based on a
private-key, which Alice chooses at the beginning, and which Bob needs, in order to decrypt. But how can one send
the key to Bob? At this point, the problem is almost equivalent to the original one. Hence, some significant security
flaws exist in private-key algorithms. If both the algorithm and the key are found out by a third party (which we will
refer to as Oscar), then he can easily decrypt the messages between Alice and Bob.
During WW2, the Nazis developed the Enigma encryption machine. It used private keys (strings of characters)
which had to be delivered to the front line (usually using a new key every day, Berlin was sending notebooks of new
keys every month). The Enigma Machine used a complex system of rotors that encrypted messages using keys, and
it was hard to backtrack the algorithm and to decrypt it, but not impossible. In order to do so, the Allies needed 2
things: the Enigma Machine (which if carefully analyzed, would provide the algorithm) and a way of intercepting the
keys.
In 1942, the British forces captured a German Submarine which had an Enigma Machine. These machines placed
on submarines were very special, because of their location they could not communicate that often with the mainland
in a safe way and therefore they had to be provided with keys for more than one month. The mathematical genius
Alan Turing analyzed the Enigma Machine and the keys that were captured, and developed one of the first computers
(called the Bomb) that was used to decrypt other messages, even the ones encrypted with the land based Enigma
machines. But they still needed the private-keys, which were changed daily. The private-keys were obtained in various
ways: spies, intercepting Morse code, etc.
2 Key exchange protocols and Public-key encryption
In the 70s, Whitfield Diffie and Martin Hellman, using ideas of Ralph Merkle, had a breakthrough: they developed
the first method of securely exchanging cryptographic keys over a public channel1
. Soon after, Ron Rivest, Adi Shamir
and Leonard Adleman created the first public-key encryption algorithm (now called the RSA algorithm).
To understand the RSA, let’s imagine this scenario:
Suppose Alice wants to send a message to Bob. Then Bob generates a public-key (which can be a number, a
string, or any mathematical object) which he shares with everyone. He also generates a private-key, but contrary to
the before mentioned algorithms, he keeps it to himself. Hence Oscar, the man in the middle can see the public-key,
but this information should be useless for him. Oscar can’t get the private-key.
Intuitively, what happens is this: Bob has a mailbox which he leaves open. He also has a lock to the mailbox
which is the public-key, and a key to the lock: the private-key. Bob then leaves the lock next to the mailbox, and
everyone has access to the lock and mailbox. If Alice wants to send him a message she goes to his mailbox, puts the
message there and locks it with the lock Bob left. The trick is this: only Bob, using the private-key, can open it.
The idea is very simple and was definitely used by people before, without realizing its importance. To implement
it, mathematics came to the rescue. And it was some very advanced mathematics, especially for that time. Both the
Diffie-Hellman and the RSA algorithm use big prime numbers and a Mathematical object: The Group. I will define
the concept below and use it to construct the DH key exchange algorithm.
1In 1997 the British Government declassified old documents, which revealed that they had these algorithms before 1975, but were kept
secret
1
3 Diffie-Hellman’s algorithm and The Discrete Logarithm
I will now try to describe one of the most complex and beautiful mathematical objects and a centerpiece of modern
mathematics: The Group.
Roughly speaking, a group is a set of abstract elements, which admit a binary operation, a neutral element, and
each element has an inverse.
Examples:
1. The integers Z = {. . . , −2, −1, 0, 1, 2, . . .} form a group under the addition operation. The neutral element is
0, because n + 0 = 0 + n = n for any n ∈ Z. The inverse (with respect to addition) of any element n ∈ Z is −n
(so the inverse of 7 is -7).
Rational numbers (fractions) and real numbers with the + operation also form a group.
2. The real numbers R, but without 0 (which I will call R∗
) with multiplication form a group. The neutral
element is 1, because x · 1 = x · 1 = x and the inverse of x is 1/x (this is the reason why 0 needs to be thrown
away).
3. The cyclic groups Zn where n is a natural number, with addition as binary operation. We are already used to
them: Z24 is the clock with 24 hours. We never say the current hour is 24, but we go back to 0. That is, we
take the remainder modulo 24. I will put a hat over the numbers, to emphasize the cyclic structure.
A few cyclic groups
Z2 = 0, 1 Z3 = 0, 1, 2 Z4 = 0, 1, 2, 3 Z5 = 0, 1, 2, 3, 4
+ 0 1
0 0 1
1 1 0
+ 0 1 2
0 0 1 2
1 1 2 0
2 2 0 1
+ 0 1 2 3
0 0 1 2 3
1 1 2 3 0
2 2 3 0 1
3 3 0 1 2
+ 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
Observe that all the cyclic groups Zn have a generator: 1 (meaning that if you add 1 enough times, you span the
whole group). But he might not be the only generator.
For example, in Z4 the element 3 is also a generator: the list of elements
3, 3 + 3, 3 + 3 + 3, 3 + 3 + 3 + 3
gives the elements
3
+3
−−−−→ 2
+3
−−−−→ 1
+3
−−−−→ 0
Hence we found all of Z4. On the other hand, in Z4 the element 2 is not a generator:
2, 2 + 2, 2 + 2 + 2, 2 + 2 + 2 + 2
gets stuck in a smaller cycle (a subgroup):
2
+2
−−−−→ 0
+2
−−−−→ 2
+0
−−−−→ 0
For example, in Z10, the generators are 1, 3, 7, 9 and all the others can’t be generators.
The Diffie-Hellman algorithm is based on the following groups: Choose a prime number p. Then in the same way
as in example 2 above, we can throw away the element 0 from Zp and change the internal operation from addition
to multiplication. This new set will be denoted Z∗
p. It is not that obvious that what’s left, with multiplication, is a
group. Primality of p is crucial for the existence of the inverses. The truly remarkable thing is the following:
Wedderburn’s Theorem:2
Let p be a prime. Then the groups Z∗
p are cyclic. That is, there exists an element
x ∈ Z∗
p which generates the group.
For example, let’s choose p = 7. Then Z∗
7 = 1, 2, 3, 4, 5, 6 . The binary operation is multiplication this
time, not addition!
2Joseph Henry Maclagan Wedderburn (1882-1948) was a Scottish Mathematician, Professor at Princeton. The above statement is just
a particular case of a more general statement.
2
The multiplication table of this group is the following:
· 1 2 3 4 5 6
1 1 2 3 4 5 6
2 2 4 6 1 3 5
3 3 6 2 5 1 4
4 4 1 5 2 6 3
5 5 3 1 6 4 2
6 6 5 4 3 2 1
Observe the computational difficulty in constructing the multiplication table
for this group, compared to its additive version. Multiplication itself is harder
to manage, and combined with taking the remainder modulo p = 7 complicates
the problem very much, if p is very big. For example, 5·6 = 2 because 5·6 = 30
and 30 divided by 7 gives remainder 2. Same strategy holds for all of them.
The apparent random character of this group, i.e. the fact that it is so hard
to predict what will happen (if p is huge) makes it useful for cryptography. In
the same way the German Enigma machine was meant to ”mix” the input in a
way hard to predict, same analogy holds here.
We will start searching for a generator (more might exist, as before). We know at least one has to exist, but we
have no clue right now which one it is. We will test them one by one. Once again, p = 7.
• Test 1: It is easy to see that by multiplying him, you get stuck from the very beginning:
1
·1
−−−−−→ 1
·1
−−−−−→ 1
so you always get 1.
• Test 2: Starting the generation process and remembering we always take the remainder modulo p = 7, we obtain
2
·2
−−−−−→ 4
·2
−−−−−→ 1
·2
−−−−−→ 2
and we see that we return back to 2, hence we can’t span the whole group Z∗
7.
• Test 3: Starting the generation process and remembering we always take the remainder modulo p = 7, we obtain:
3
·3
−−−−−→ 2
·3
−−−−−→ 6
·3
−−−−−→ 4
·3
−−−−−→ 5
·3
−−−−−→ 1
and we spanned the whole group.
Thus, we found a generator in Z7, the element 3.
We can now describe the key-exchange algorithm, using the group Z∗
p. So Alice and Bob want to be able to
securely decide on a key. The key will be a an element of Z∗
p. In practice p is a very big prime number. I will say
more about big prime numbers in the next section.
1. Alice and Bob agree on a prime number p and a generator of Z∗
p, call it g. These need not be protected, meaning
that Oscar can have this information.
2. Alice chooses a secret natural number a and sends to Bob the element ga
(i.e. she computes ga
and takes the
remainder modulo p). Oscar can intercept this.
3. Bob chooses a secret natural number b and sends to Alice the element gb
(i.e. he computes gb
and takes the
remainder modulo p). Oscar can intercept this.
4. Alice receives the element gb and she computes gb
a
5. Bob receives the element ga and he computes ga b
6. Now both Alice and Bob end up with the same number, but Oscar does not have enough information to easily
compute this number. We will see below why.
A worked example for p = 23.
1. Say Alice and Bob decide to use p = 23. The element g = 5 is a generator of Z∗
23. Oscar can intercept these.
2. Say Alice chooses a = 8, which she doesn’t send to anyone. She then computes
58
= 390, 625 = 16
(because 390, 625 = 23 · 16, 983 + 16 hence the remainder of 390,625 when divided by 23 is 16). Hence
Alice
16
−−→ Bob and Oscar can intercept this.
3
3. Say Bob chooses b = 3, which he doesn’t send to anyone. He then computes
53
= 125 = 10
(because 125 = 23 · 5 + 10 hence the remainder of 125 when divided by 23 is 10). Hence
Bob
10
−−→ Alice and Oscar can intercept this.
4. Alice receives the number 10 and then she computes
10
8
= 100, 000, 000 = 2
(because 100, 000, 000 = 23 · 4347826 + 2 hence the remainder of 100,000,000 when divided by 23 is 2).
5. Bob receives the number 16 and then he computes
16
3
= 4, 096 = 2
(because 4, 096 = 23 · 178 + 2 hence the remainder of 4,096 when divided by 23 is 2).
6. They obtain the same number at the end: 2. This is the key they will use for encryption.
It is not hard to see that this will always happen, because both Alice and Bob compute gab
. But let’s analyze
what Oscar has: He has p = 23, g = 5 and the numbers 16 and 10 which Alice and Bob exchanged on the public
channel of communication. It is hard for him to obtain the key 2 (in practice the prime numbers used are very big).
Oscar does not have the numbers a = 8 and b = 3. He only knows that
5a
= 16
5b
= 10
Solving for a or b is known as the Discrete Logarithm problem.
In analogy with the group of positive real numbers R∗
+ (hence not hats) where the logarithm log function satisfies
(for example)
log 5(125) = 3 =⇒ 53
= 125
we have that in the finite group Z∗
7 the discrete logarithm dlog satisfies (for example)
dlog 5(16) = 8 =⇒ 58
= 16
4 Attacks on Diffie-Hellman
The brute-force attack that Oscar can try would be to test all the numbers 1 ≤ i ≤ 22 (in general, from 1 to
p−1), compute 5i
(that is, he computes 5i
and takes the remainder modulo p = 23). This is actually why the element
g should be a generator. Observe that the previous scheme would still work if g is not a generator, but it is unsafe: a
non-generator g generates a subgroup, not the whole group, which might be much smaller than the original Z∗
p.
As we can see, it is very important to have very big prime numbers. Same holds true for the RSA algorithm,
which I only presented intuitively. Like Diffie-Hellman, RSA also requires a group that involves prime numbers. As
the computational power increases, both for Alice and Bob, but also for Oscar (the hacker), there is a need for bigger
and bigger prime numbers every day. The second part of the table consists of prime numbers that need to be double
checked (computation in progress).
Discovery date Number of Digits Location Processor
September 4, 2006 9,808,358 University of Central Missouri Pentium 4 (3 GHz)
September 6, 2008 11,185,272 UCLA Intel Core 2 Duo E6600 CPU (2.4 GHz)
August 23, 2008 12,978,189 UCLA Intel Core 2 Duo E6600 CPU (2.4 GHz)
April 12, 2009 12,837,064 Melhus, Norway Intel Core 2 Duo (3 GHz)
January 25, 2013 17,425,170 University of Central Missouri Intel Core 2 Duo E8400 (3.00GHz)
January 7, 2016 22,338,618 University of Central Missouri Intel Core i7-4790
This doesn’t mean that we know all the prime numbers up to the last one found. There are many other primes,
much smaller, that haven’t been discovered yet. Understanding the density of prime numbers is one of mathematics’
4
biggest problems. For instance, UCLA Professor Terence Tao achieved the Fields Medal in 2006 for a theorem that
sheds light on the distribution of primes. But his proof uses a completely different type of mathematics than Number
Theory: Fourier/Harmonic Analysis. Having a profound, rigorous mathematical understanding of the distribution of
prime numbers is crucial in order to keep up with the ever increasing power of cloud computing, which can even break
the algorithm for big primes p.
It was revealed in 2015 that NSA can break the Diffie-Hellman algorithm used for VPN traffic and some HTTPs
and SSH connections. This happened in part because of the poor application of mathematics. For example, as shown
earlier, it is important to choose a generator g of Z∗
p, and not just any element, because there is risk when creating a
very small subgroup, in which the brute-force attack can solve the Discrete Logarithm extremely fast. But the lack of
insight and perhaps unwillingness to change the way things are done leads to implementing the protocol incorrectly.
This requires searching for a generator and this is no easy task. A good understating of Wedderburn’s Theorem is
needed to be able to obtain a generator in a decent amount of time. Combined with choices of some ”bad and small”3
prime numbers led to NSA’s ability to crack the algorithm. Immediately after this was revealed, Internet Explorer,
Chrome and Firefox immediately switched from prime numbers on 512 to 1024 bits, but this is not enough. They
(and not only them) were caught with their guard down, and were not prepared with the correct prime numbers
and generators which had to be big enough in size and in quantity, so that they don’t reuse the same primes and
generators (and sometimes, even non-generators) over and over again. Doing this fundamental mistake gives Oscar a
big advantage.
More advanced Key-exchange and Public-key algorithms exist, for example by adapting the Cyclic Groups in DH
and RSA to Groups of Elliptic Curves. But they are also harder to understand and implement. If we don’t update
our algorithms by using what Modern Mathematics has to offer, we will return to the 1960s, to the old private-key
methods, when one could see traveling agents with briefcases (that contained the keys for encryption between two
institutions) handcuffed to their hand. This would render ATMs, PayPal and online payments into a distant memory.
It is no surprise that Riemann’s Hypothesis ’the Holy Grail of Mathematics’ is one of the The Millennium Prize
Problems, worth $1,000,000 as one of its consequences is a better characterization of the distribution of prime numbers.
Although almost everyone believes the hypothesis is true and some encryption algorithms start under the assumption
it is true... what if it’s not? What if someone knows this, but keeps it a secret? It’s not for the first time someone
kept things secret, we just saw that NSA did this, and also the British Government, when they hid their key-exchange
algorithm back in the 1950s.
Because of this, we should always adapt and modernize our algorithms, and not rely on the fact that they can’t
be broken just because the numbers look big enough, and they look unbreakable to us. As we can see, mathematics
always gave new tools to a new generation of algorithms. Mathematical Groups, such as Cyclic groups, Groups of
Elliptic Curves, Braid Groups, which are seemingly more complicated than the (apparently) simple ones presented
earlier were always the answer. S.P. Novikov constructed a special class of infinite groups, for which a property is
NP-hard to compute. Intuitively speaking, a problem is NP-hard if it is APPARENTLY very hard to compute, even
with today’s computers. As of today, we don’t know if NP problems are P problems (that is, if they are ACTUALLY
easy to compute). The P=NP problem is another $1,000,000 Millennium Prize Problems.
3Some primes are better than others
5

More Related Content

Viewers also liked

Catcom | Kỹ năng trả lời phỏng vấn xin việc
Catcom | Kỹ năng trả lời phỏng vấn xin việcCatcom | Kỹ năng trả lời phỏng vấn xin việc
Catcom | Kỹ năng trả lời phỏng vấn xin việcCatcom VN
 
Executive Summary
Executive SummaryExecutive Summary
Executive SummarySudharsan M
 
Periférico de procesamiento de datos
Periférico de procesamiento de datosPeriférico de procesamiento de datos
Periférico de procesamiento de datosRamonJimenezmejia
 
Aprendizaje autónomo
Aprendizaje autónomoAprendizaje autónomo
Aprendizaje autónomoMarkoLandin23
 
MSomozaResume 12.4.2016
MSomozaResume 12.4.2016MSomozaResume 12.4.2016
MSomozaResume 12.4.2016Mario Somoza
 
TA Corp Brochure_FINAL
TA Corp Brochure_FINALTA Corp Brochure_FINAL
TA Corp Brochure_FINALHeidi Winyard
 
Marca Personal: Andrés Álvarez
Marca Personal: Andrés ÁlvarezMarca Personal: Andrés Álvarez
Marca Personal: Andrés ÁlvarezAndrés Álvarez
 

Viewers also liked (10)

Catcom | Kỹ năng trả lời phỏng vấn xin việc
Catcom | Kỹ năng trả lời phỏng vấn xin việcCatcom | Kỹ năng trả lời phỏng vấn xin việc
Catcom | Kỹ năng trả lời phỏng vấn xin việc
 
Quadratic equations
Quadratic equationsQuadratic equations
Quadratic equations
 
EL NARCOTRÁFICO EN MÉXICO
EL NARCOTRÁFICO EN MÉXICOEL NARCOTRÁFICO EN MÉXICO
EL NARCOTRÁFICO EN MÉXICO
 
Executive Summary
Executive SummaryExecutive Summary
Executive Summary
 
Periférico de procesamiento de datos
Periférico de procesamiento de datosPeriférico de procesamiento de datos
Periférico de procesamiento de datos
 
Aprendizaje autónomo
Aprendizaje autónomoAprendizaje autónomo
Aprendizaje autónomo
 
Jueves
JuevesJueves
Jueves
 
MSomozaResume 12.4.2016
MSomozaResume 12.4.2016MSomozaResume 12.4.2016
MSomozaResume 12.4.2016
 
TA Corp Brochure_FINAL
TA Corp Brochure_FINALTA Corp Brochure_FINAL
TA Corp Brochure_FINAL
 
Marca Personal: Andrés Álvarez
Marca Personal: Andrés ÁlvarezMarca Personal: Andrés Álvarez
Marca Personal: Andrés Álvarez
 

Modern Mathematics and the Evolution of Cryptography

  • 1. Modern Mathematics and the Evolution of Cryptography Alin Galatan 1 History Since Antiquity, various civilizations realized the need to encrypt their messages, especially because they had to send messages to military leaders on the battlefront. It is no surprise that one of the first encryption algorithms is called Caesar’s Algorithm. But this was as easy and rudimentary as the mathematics of its time. Say someone (which we will refer to as Alice) wants to send the message ”ABZ” to a second person (which we will refer to as Bob). We will use the English alphabet, for simplicity. Then Alice would choose a private-key, which is a number between 1 and 25. Say she chooses 2. Alice would then cycle through the alphabet such that: A −→ C, B −→ D, Z −→ B Hence ABZ becomes CDB, which Bob can very easily decrypt - he just cycles backwards. As society evolved, mathematics and cryptographic algorithms evolved in parallel. But they were all based on a private-key, which Alice chooses at the beginning, and which Bob needs, in order to decrypt. But how can one send the key to Bob? At this point, the problem is almost equivalent to the original one. Hence, some significant security flaws exist in private-key algorithms. If both the algorithm and the key are found out by a third party (which we will refer to as Oscar), then he can easily decrypt the messages between Alice and Bob. During WW2, the Nazis developed the Enigma encryption machine. It used private keys (strings of characters) which had to be delivered to the front line (usually using a new key every day, Berlin was sending notebooks of new keys every month). The Enigma Machine used a complex system of rotors that encrypted messages using keys, and it was hard to backtrack the algorithm and to decrypt it, but not impossible. In order to do so, the Allies needed 2 things: the Enigma Machine (which if carefully analyzed, would provide the algorithm) and a way of intercepting the keys. In 1942, the British forces captured a German Submarine which had an Enigma Machine. These machines placed on submarines were very special, because of their location they could not communicate that often with the mainland in a safe way and therefore they had to be provided with keys for more than one month. The mathematical genius Alan Turing analyzed the Enigma Machine and the keys that were captured, and developed one of the first computers (called the Bomb) that was used to decrypt other messages, even the ones encrypted with the land based Enigma machines. But they still needed the private-keys, which were changed daily. The private-keys were obtained in various ways: spies, intercepting Morse code, etc. 2 Key exchange protocols and Public-key encryption In the 70s, Whitfield Diffie and Martin Hellman, using ideas of Ralph Merkle, had a breakthrough: they developed the first method of securely exchanging cryptographic keys over a public channel1 . Soon after, Ron Rivest, Adi Shamir and Leonard Adleman created the first public-key encryption algorithm (now called the RSA algorithm). To understand the RSA, let’s imagine this scenario: Suppose Alice wants to send a message to Bob. Then Bob generates a public-key (which can be a number, a string, or any mathematical object) which he shares with everyone. He also generates a private-key, but contrary to the before mentioned algorithms, he keeps it to himself. Hence Oscar, the man in the middle can see the public-key, but this information should be useless for him. Oscar can’t get the private-key. Intuitively, what happens is this: Bob has a mailbox which he leaves open. He also has a lock to the mailbox which is the public-key, and a key to the lock: the private-key. Bob then leaves the lock next to the mailbox, and everyone has access to the lock and mailbox. If Alice wants to send him a message she goes to his mailbox, puts the message there and locks it with the lock Bob left. The trick is this: only Bob, using the private-key, can open it. The idea is very simple and was definitely used by people before, without realizing its importance. To implement it, mathematics came to the rescue. And it was some very advanced mathematics, especially for that time. Both the Diffie-Hellman and the RSA algorithm use big prime numbers and a Mathematical object: The Group. I will define the concept below and use it to construct the DH key exchange algorithm. 1In 1997 the British Government declassified old documents, which revealed that they had these algorithms before 1975, but were kept secret 1
  • 2. 3 Diffie-Hellman’s algorithm and The Discrete Logarithm I will now try to describe one of the most complex and beautiful mathematical objects and a centerpiece of modern mathematics: The Group. Roughly speaking, a group is a set of abstract elements, which admit a binary operation, a neutral element, and each element has an inverse. Examples: 1. The integers Z = {. . . , −2, −1, 0, 1, 2, . . .} form a group under the addition operation. The neutral element is 0, because n + 0 = 0 + n = n for any n ∈ Z. The inverse (with respect to addition) of any element n ∈ Z is −n (so the inverse of 7 is -7). Rational numbers (fractions) and real numbers with the + operation also form a group. 2. The real numbers R, but without 0 (which I will call R∗ ) with multiplication form a group. The neutral element is 1, because x · 1 = x · 1 = x and the inverse of x is 1/x (this is the reason why 0 needs to be thrown away). 3. The cyclic groups Zn where n is a natural number, with addition as binary operation. We are already used to them: Z24 is the clock with 24 hours. We never say the current hour is 24, but we go back to 0. That is, we take the remainder modulo 24. I will put a hat over the numbers, to emphasize the cyclic structure. A few cyclic groups Z2 = 0, 1 Z3 = 0, 1, 2 Z4 = 0, 1, 2, 3 Z5 = 0, 1, 2, 3, 4 + 0 1 0 0 1 1 1 0 + 0 1 2 0 0 1 2 1 1 2 0 2 2 0 1 + 0 1 2 3 0 0 1 2 3 1 1 2 3 0 2 2 3 0 1 3 3 0 1 2 + 0 1 2 3 4 0 0 1 2 3 4 1 1 2 3 4 0 2 2 3 4 0 1 3 3 4 0 1 2 4 4 0 1 2 3 Observe that all the cyclic groups Zn have a generator: 1 (meaning that if you add 1 enough times, you span the whole group). But he might not be the only generator. For example, in Z4 the element 3 is also a generator: the list of elements 3, 3 + 3, 3 + 3 + 3, 3 + 3 + 3 + 3 gives the elements 3 +3 −−−−→ 2 +3 −−−−→ 1 +3 −−−−→ 0 Hence we found all of Z4. On the other hand, in Z4 the element 2 is not a generator: 2, 2 + 2, 2 + 2 + 2, 2 + 2 + 2 + 2 gets stuck in a smaller cycle (a subgroup): 2 +2 −−−−→ 0 +2 −−−−→ 2 +0 −−−−→ 0 For example, in Z10, the generators are 1, 3, 7, 9 and all the others can’t be generators. The Diffie-Hellman algorithm is based on the following groups: Choose a prime number p. Then in the same way as in example 2 above, we can throw away the element 0 from Zp and change the internal operation from addition to multiplication. This new set will be denoted Z∗ p. It is not that obvious that what’s left, with multiplication, is a group. Primality of p is crucial for the existence of the inverses. The truly remarkable thing is the following: Wedderburn’s Theorem:2 Let p be a prime. Then the groups Z∗ p are cyclic. That is, there exists an element x ∈ Z∗ p which generates the group. For example, let’s choose p = 7. Then Z∗ 7 = 1, 2, 3, 4, 5, 6 . The binary operation is multiplication this time, not addition! 2Joseph Henry Maclagan Wedderburn (1882-1948) was a Scottish Mathematician, Professor at Princeton. The above statement is just a particular case of a more general statement. 2
  • 3. The multiplication table of this group is the following: · 1 2 3 4 5 6 1 1 2 3 4 5 6 2 2 4 6 1 3 5 3 3 6 2 5 1 4 4 4 1 5 2 6 3 5 5 3 1 6 4 2 6 6 5 4 3 2 1 Observe the computational difficulty in constructing the multiplication table for this group, compared to its additive version. Multiplication itself is harder to manage, and combined with taking the remainder modulo p = 7 complicates the problem very much, if p is very big. For example, 5·6 = 2 because 5·6 = 30 and 30 divided by 7 gives remainder 2. Same strategy holds for all of them. The apparent random character of this group, i.e. the fact that it is so hard to predict what will happen (if p is huge) makes it useful for cryptography. In the same way the German Enigma machine was meant to ”mix” the input in a way hard to predict, same analogy holds here. We will start searching for a generator (more might exist, as before). We know at least one has to exist, but we have no clue right now which one it is. We will test them one by one. Once again, p = 7. • Test 1: It is easy to see that by multiplying him, you get stuck from the very beginning: 1 ·1 −−−−−→ 1 ·1 −−−−−→ 1 so you always get 1. • Test 2: Starting the generation process and remembering we always take the remainder modulo p = 7, we obtain 2 ·2 −−−−−→ 4 ·2 −−−−−→ 1 ·2 −−−−−→ 2 and we see that we return back to 2, hence we can’t span the whole group Z∗ 7. • Test 3: Starting the generation process and remembering we always take the remainder modulo p = 7, we obtain: 3 ·3 −−−−−→ 2 ·3 −−−−−→ 6 ·3 −−−−−→ 4 ·3 −−−−−→ 5 ·3 −−−−−→ 1 and we spanned the whole group. Thus, we found a generator in Z7, the element 3. We can now describe the key-exchange algorithm, using the group Z∗ p. So Alice and Bob want to be able to securely decide on a key. The key will be a an element of Z∗ p. In practice p is a very big prime number. I will say more about big prime numbers in the next section. 1. Alice and Bob agree on a prime number p and a generator of Z∗ p, call it g. These need not be protected, meaning that Oscar can have this information. 2. Alice chooses a secret natural number a and sends to Bob the element ga (i.e. she computes ga and takes the remainder modulo p). Oscar can intercept this. 3. Bob chooses a secret natural number b and sends to Alice the element gb (i.e. he computes gb and takes the remainder modulo p). Oscar can intercept this. 4. Alice receives the element gb and she computes gb a 5. Bob receives the element ga and he computes ga b 6. Now both Alice and Bob end up with the same number, but Oscar does not have enough information to easily compute this number. We will see below why. A worked example for p = 23. 1. Say Alice and Bob decide to use p = 23. The element g = 5 is a generator of Z∗ 23. Oscar can intercept these. 2. Say Alice chooses a = 8, which she doesn’t send to anyone. She then computes 58 = 390, 625 = 16 (because 390, 625 = 23 · 16, 983 + 16 hence the remainder of 390,625 when divided by 23 is 16). Hence Alice 16 −−→ Bob and Oscar can intercept this. 3
  • 4. 3. Say Bob chooses b = 3, which he doesn’t send to anyone. He then computes 53 = 125 = 10 (because 125 = 23 · 5 + 10 hence the remainder of 125 when divided by 23 is 10). Hence Bob 10 −−→ Alice and Oscar can intercept this. 4. Alice receives the number 10 and then she computes 10 8 = 100, 000, 000 = 2 (because 100, 000, 000 = 23 · 4347826 + 2 hence the remainder of 100,000,000 when divided by 23 is 2). 5. Bob receives the number 16 and then he computes 16 3 = 4, 096 = 2 (because 4, 096 = 23 · 178 + 2 hence the remainder of 4,096 when divided by 23 is 2). 6. They obtain the same number at the end: 2. This is the key they will use for encryption. It is not hard to see that this will always happen, because both Alice and Bob compute gab . But let’s analyze what Oscar has: He has p = 23, g = 5 and the numbers 16 and 10 which Alice and Bob exchanged on the public channel of communication. It is hard for him to obtain the key 2 (in practice the prime numbers used are very big). Oscar does not have the numbers a = 8 and b = 3. He only knows that 5a = 16 5b = 10 Solving for a or b is known as the Discrete Logarithm problem. In analogy with the group of positive real numbers R∗ + (hence not hats) where the logarithm log function satisfies (for example) log 5(125) = 3 =⇒ 53 = 125 we have that in the finite group Z∗ 7 the discrete logarithm dlog satisfies (for example) dlog 5(16) = 8 =⇒ 58 = 16 4 Attacks on Diffie-Hellman The brute-force attack that Oscar can try would be to test all the numbers 1 ≤ i ≤ 22 (in general, from 1 to p−1), compute 5i (that is, he computes 5i and takes the remainder modulo p = 23). This is actually why the element g should be a generator. Observe that the previous scheme would still work if g is not a generator, but it is unsafe: a non-generator g generates a subgroup, not the whole group, which might be much smaller than the original Z∗ p. As we can see, it is very important to have very big prime numbers. Same holds true for the RSA algorithm, which I only presented intuitively. Like Diffie-Hellman, RSA also requires a group that involves prime numbers. As the computational power increases, both for Alice and Bob, but also for Oscar (the hacker), there is a need for bigger and bigger prime numbers every day. The second part of the table consists of prime numbers that need to be double checked (computation in progress). Discovery date Number of Digits Location Processor September 4, 2006 9,808,358 University of Central Missouri Pentium 4 (3 GHz) September 6, 2008 11,185,272 UCLA Intel Core 2 Duo E6600 CPU (2.4 GHz) August 23, 2008 12,978,189 UCLA Intel Core 2 Duo E6600 CPU (2.4 GHz) April 12, 2009 12,837,064 Melhus, Norway Intel Core 2 Duo (3 GHz) January 25, 2013 17,425,170 University of Central Missouri Intel Core 2 Duo E8400 (3.00GHz) January 7, 2016 22,338,618 University of Central Missouri Intel Core i7-4790 This doesn’t mean that we know all the prime numbers up to the last one found. There are many other primes, much smaller, that haven’t been discovered yet. Understanding the density of prime numbers is one of mathematics’ 4
  • 5. biggest problems. For instance, UCLA Professor Terence Tao achieved the Fields Medal in 2006 for a theorem that sheds light on the distribution of primes. But his proof uses a completely different type of mathematics than Number Theory: Fourier/Harmonic Analysis. Having a profound, rigorous mathematical understanding of the distribution of prime numbers is crucial in order to keep up with the ever increasing power of cloud computing, which can even break the algorithm for big primes p. It was revealed in 2015 that NSA can break the Diffie-Hellman algorithm used for VPN traffic and some HTTPs and SSH connections. This happened in part because of the poor application of mathematics. For example, as shown earlier, it is important to choose a generator g of Z∗ p, and not just any element, because there is risk when creating a very small subgroup, in which the brute-force attack can solve the Discrete Logarithm extremely fast. But the lack of insight and perhaps unwillingness to change the way things are done leads to implementing the protocol incorrectly. This requires searching for a generator and this is no easy task. A good understating of Wedderburn’s Theorem is needed to be able to obtain a generator in a decent amount of time. Combined with choices of some ”bad and small”3 prime numbers led to NSA’s ability to crack the algorithm. Immediately after this was revealed, Internet Explorer, Chrome and Firefox immediately switched from prime numbers on 512 to 1024 bits, but this is not enough. They (and not only them) were caught with their guard down, and were not prepared with the correct prime numbers and generators which had to be big enough in size and in quantity, so that they don’t reuse the same primes and generators (and sometimes, even non-generators) over and over again. Doing this fundamental mistake gives Oscar a big advantage. More advanced Key-exchange and Public-key algorithms exist, for example by adapting the Cyclic Groups in DH and RSA to Groups of Elliptic Curves. But they are also harder to understand and implement. If we don’t update our algorithms by using what Modern Mathematics has to offer, we will return to the 1960s, to the old private-key methods, when one could see traveling agents with briefcases (that contained the keys for encryption between two institutions) handcuffed to their hand. This would render ATMs, PayPal and online payments into a distant memory. It is no surprise that Riemann’s Hypothesis ’the Holy Grail of Mathematics’ is one of the The Millennium Prize Problems, worth $1,000,000 as one of its consequences is a better characterization of the distribution of prime numbers. Although almost everyone believes the hypothesis is true and some encryption algorithms start under the assumption it is true... what if it’s not? What if someone knows this, but keeps it a secret? It’s not for the first time someone kept things secret, we just saw that NSA did this, and also the British Government, when they hid their key-exchange algorithm back in the 1950s. Because of this, we should always adapt and modernize our algorithms, and not rely on the fact that they can’t be broken just because the numbers look big enough, and they look unbreakable to us. As we can see, mathematics always gave new tools to a new generation of algorithms. Mathematical Groups, such as Cyclic groups, Groups of Elliptic Curves, Braid Groups, which are seemingly more complicated than the (apparently) simple ones presented earlier were always the answer. S.P. Novikov constructed a special class of infinite groups, for which a property is NP-hard to compute. Intuitively speaking, a problem is NP-hard if it is APPARENTLY very hard to compute, even with today’s computers. As of today, we don’t know if NP problems are P problems (that is, if they are ACTUALLY easy to compute). The P=NP problem is another $1,000,000 Millennium Prize Problems. 3Some primes are better than others 5