Overview
We have spent the last several weeks learning about encryption in my computer security class so I thought I’d share what I’ve learned on public key cryptography.
There is a very good description of RSA on Wikipedia, so I don’t want to simply restate what they have. The focus here will be the generation of public and private keys as I feel many of the RSA tutorials on the web are lacking a bit in that department. Computing the multiplicative inverse to get d from e is a little tricky, but we will walk through it step-by-step.
First, a brief overview of RSA, for those not familiar with it already. A message M is encrypted by raising it to the power of e and then taking the result modulo some number N. To decrypt the message, you simply raise the value of the encrypted message C to the power of d and again mod by N. The beauty of RSA is that e and N can be published publicly. Together they, in fact, comprise the public key. The private key, which is not be published, is comprised of d and N.
C = Me mod N
M = Cd mod N
If you’re like me, then you are astonished at 1) how simple this system is, and 2) that you can exponentiate messages twice (modulo some number) and leave the original message unaltered. The main question that my skeptical mind came up with when presented with this powerful encryption tool was, “wouldn’t it be easy to compute d if you have the values of e and N?” The answer is, of course, no. It turns out that it is very hard to do so. We shall see later that it is easy to compute d only when we have the factors of N. If we choose N to be arbitrarily large, factoring N can take an arbitrarily long period of time. Currently, there are no known polynomial-time algorithms which can perform this task. Factorization has, in fact, been shown to be in the set of problems known as NP. So the security of RSA is essentially provided by the hardness of the factorization problem. If someone figures out a way to factor large numbers fast, then RSA is out of business.
Key Generation
As was mentioned above, RSA’s security is rooted in the fact that N is hard to factor. Therefore, we should choose N to be the product of two large primes, p and q. For clarity in this example, we will choose relatively small values for p and q, but later we will discuss the proper choices for these coefficients given a desired level of security.
- For this example, let P = 647 and Q = 1871. This means that the modulus, N = 1210537. (Incidentally, factoring this value of N took 0.056 seconds on UCR’s mainframe).
- Compute the totient of N, φ(N) = (P – 1)(Q – 1) = 1208020.
- Now we choose a number e which should be coprime to φ(N). The easiest way to do this is to simply choose a prime number. For this example, let e = 1127.
- The next step is to compute d such that (d * e) mod φ(N) = 1. If this is confusing, that is okay. This property is important because it ensures that (Me)d (mod n) = M. It may help to have a look at Euler’s Theorem if you are still confused.
The best way to compute the multiplicative inverse, d from e and φ(N) is to use the Extended Euclidean Algorithm. Here is Euclid’s algorithm for our example:
1127 |
1208020 |
(1, 0) |
(0, 1) |
We start with unit vectors (1, 0) and (0, 1) which correspond to the values of e and φ(N), respectively.
For each operation we perform on the left two columns, we perform the same operation on the right two columns.
For example, in the first step, 1127 divides 1208020 1071 times and leaves a remainder of 1003. The corresponding operation in columns 3 and 4 is to subtract (1, 0) from (0, 1) 1071 times yielding (-1071, 1).
The algorithm terminates when we have 1 and 0, not necessarilly in that order, in the first two columns. The value for d is in the column that corresponds to the 1 in the first two columns.
*Note: it is worth mentioning that it is possible for the extended Euclidean algorithm to yield a negative result for d. Obviously, this is not a suitable decryption exponent because raising an integer to a negative number results in a fraction. The simple fix here is to mod the negative value of d by φ(N), giving us a positive value of d between 0 and φ(N). |
1127 |
1003 |
(1, 0) |
(-1071, 1) |
124 |
1003 |
(1072, -1) |
(-1071, 1) |
124 |
11 |
(1072, -1) |
(-9647, 9) |
3 |
11 |
(107189, -100) |
(-9647, 9) |
3 |
2 |
(107189, -100) |
(-331214, 309) |
1 |
2 |
(438403, -409) |
(-331214, 309) |
1 |
0 |
(438403, -409) |
(-1208020, 1127) |
From the above calculations we know that d = 438403. So we have both the public and private keys for this user:
public key = (1127, 1210537)
private key = (438403, 1210537)
To prove that this system works, observe the following computations. Let our message M = 247. The first step is to compute C = 2471127 mod 1210537.
A brief aside:
This exponentiation can be computed easily because we are using relatively small values for e and d. However, real world implementations of RSA often use 1024 bit encryption, meaning the exponent is 1024 bits long. That is roughly equivalent to a 300 decimal digit number. To compute an exponent of that order of magnitude in the conventional way, multiplying the base by itself e times would be prohibitively expensive. Even if we could compute 1 billion multiplications per second, the computation would take longer than the current age of the universe. So it is useful to use an alternative method like exponentiation by squaring. Here is a script that computes large exponents fast. Another consideration is the storage of a very large number such as Cd. Rather than keeping the value in main memory as we exponentiate, we can simply keep the value modulo N. And now back to our example…
2471127 mod 1210537 = 611545. This number was easily obtained with the Python interpreter in a fraction of a second. Raising this number, however to the value of d, 438403, should not be done the conventional way. On the school’s mainframe this calculation took 11 minutes, 23.65 seconds. This is a situation where we can see the power of divide-and-conquer algorithms. Using our recursive exponentiation function we show that 611545438403 mod 1210537 = 247. Voilá, out pops our original message. Additionally, the exponentiation took only 31.16 seconds on the same machine with the repeated squaring method. This can be vastly improved, too, once we develop a non-recursive function. That will be critical when we want to provide real security via RSA and we don’t want to wait 10 minutes to decrypt the message.
PyRSA now available.