Category: Programming

PyRSA – RSA in Python June 18th, 2005

PyRSA is a command line utility that allows users to digitally encrypt and sign messages using the public key encryption scheme, RSA. There are three basic functions that PyRSA performs: encryption, decryption, and key generation.



Sample Use:

1. Generate a public and private key. In this example, we will specify a key of length 1024 bits. Allow several seconds of CPU time for the generation of the keys. -g 1024 Enter file identifier (i.e. first name): brandon

2. Now the files




are in the current directory. Next place the text we want to encrypt in a text file.

echo "The sky above the port was the color of television, tuned
to a dead channel." > message.txt

3. Encrypt the message using the public key and redirect the output to a text file. -e message.txt -k brandon_publicKey.txt > ciphertext.txt

4. At this point the file ciphertext.txt contains the encrypted message. The file can safely be sent to a recipient, i.e. as an email attachment, the contents utterly unreadable to anyone without the private key.

cat ciphertext.txt 32464047998704731086703458860763720628883125201

5. Next we will assume the message has been sent to the individual who possesses the corresponding private key and he wants to decrypt the message. -d ciphertext.txt -k brandon_privateKey.txt
Decrypted text:
The sky above the port was the color of television, tuned to a dead channel.

RSA Algorithm May 5th, 2005


We have spent the last several weeks learning about encryption in my computer security class so I thought I’d share what I’ve learned on public key cryptography.

There is a very good description of RSA on Wikipedia, so I don’t want to simply restate what they have. The focus here will be the generation of public and private keys as I feel many of the RSA tutorials on the web are lacking a bit in that department. Computing the multiplicative inverse to get d from e is a little tricky, but we will walk through it step-by-step.

First, a brief overview of RSA, for those not familiar with it already. A message M is encrypted by raising it to the power of e and then taking the result modulo some number N. To decrypt the message, you simply raise the value of the encrypted message C to the power of d and again mod by N. The beauty of RSA is that e and N can be published publicly. Together they, in fact, comprise the public key. The private key, which is not be published, is comprised of d and N.

C = Me mod N
M = Cd mod N

If you’re like me, then you are astonished at 1) how simple this system is, and 2) that you can exponentiate messages twice (modulo some number) and leave the original message unaltered. The main question that my skeptical mind came up with when presented with this powerful encryption tool was, “wouldn’t it be easy to compute d if you have the values of e and N?” The answer is, of course, no. It turns out that it is very hard to do so. We shall see later that it is easy to compute d only when we have the factors of N. If we choose N to be arbitrarily large, factoring N can take an arbitrarily long period of time. Currently, there are no known polynomial-time algorithms which can perform this task. Factorization has, in fact, been shown to be in the set of problems known as NP. So the security of RSA is essentially provided by the hardness of the factorization problem. If someone figures out a way to factor large numbers fast, then RSA is out of business.

Key Generation

As was mentioned above, RSA’s security is rooted in the fact that N is hard to factor. Therefore, we should choose N to be the product of two large primes, p and q. For clarity in this example, we will choose relatively small values for p and q, but later we will discuss the proper choices for these coefficients given a desired level of security.

  1. For this example, let P = 647 and Q = 1871. This means that the modulus, N = 1210537. (Incidentally, factoring this value of N took 0.056 seconds on UCR’s mainframe).
  2. Compute the totient of N, φ(N) = (P – 1)(Q – 1) = 1208020.
  3. Now we choose a number e which should be coprime to φ(N). The easiest way to do this is to simply choose a prime number. For this example, let e = 1127.
  4. The next step is to compute d such that (d * e) mod φ(N) = 1. If this is confusing, that is okay. This property is important because it ensures that (Me)d (mod n) = M. It may help to have a look at Euler’s Theorem if you are still confused.

The best way to compute the multiplicative inverse, d from e and φ(N) is to use the Extended Euclidean Algorithm. Here is Euclid’s algorithm for our example:

1127 1208020 (1, 0) (0, 1) We start with unit vectors (1, 0) and (0, 1) which correspond to the values of e and φ(N), respectively.

For each operation we perform on the left two columns, we perform the same operation on the right two columns.

For example, in the first step, 1127 divides 1208020 1071 times and leaves a remainder of 1003. The corresponding operation in columns 3 and 4 is to subtract (1, 0) from (0, 1) 1071 times yielding (-1071, 1).

The algorithm terminates when we have 1 and 0, not necessarilly in that order, in the first two columns. The value for d is in the column that corresponds to the 1 in the first two columns.

*Note: it is worth mentioning that it is possible for the extended Euclidean algorithm to yield a negative result for d. Obviously, this is not a suitable decryption exponent because raising an integer to a negative number results in a fraction. The simple fix here is to mod the negative value of d by φ(N), giving us a positive value of d between 0 and φ(N).

1127 1003 (1, 0) (-1071, 1)
124 1003 (1072, -1) (-1071, 1)
124 11 (1072, -1) (-9647, 9)
3 11 (107189, -100) (-9647, 9)
3 2 (107189, -100) (-331214, 309)
1 2 (438403, -409) (-331214, 309)
1 0 (438403, -409) (-1208020, 1127)

From the above calculations we know that d = 438403. So we have both the public and private keys for this user:

public key = (1127, 1210537)
private key = (438403, 1210537)

To prove that this system works, observe the following computations. Let our message M = 247. The first step is to compute C = 2471127 mod 1210537.

A brief aside:
This exponentiation can be computed easily because we are using relatively small values for e and d. However, real world implementations of RSA often use 1024 bit encryption, meaning the exponent is 1024 bits long. That is roughly equivalent to a 300 decimal digit number. To compute an exponent of that order of magnitude in the conventional way, multiplying the base by itself e times would be prohibitively expensive. Even if we could compute 1 billion multiplications per second, the computation would take longer than the current age of the universe. So it is useful to use an alternative method like exponentiation by squaring. Here is a script that computes large exponents fast. Another consideration is the storage of a very large number such as Cd. Rather than keeping the value in main memory as we exponentiate, we can simply keep the value modulo N. And now back to our example…

2471127 mod 1210537 = 611545. This number was easily obtained with the Python interpreter in a fraction of a second. Raising this number, however to the value of d, 438403, should not be done the conventional way. On the school’s mainframe this calculation took 11 minutes, 23.65 seconds. This is a situation where we can see the power of divide-and-conquer algorithms. Using our recursive exponentiation function we show that 611545438403 mod 1210537 = 247. Voilá, out pops our original message. Additionally, the exponentiation took only 31.16 seconds on the same machine with the repeated squaring method. This can be vastly improved, too, once we develop a non-recursive function. That will be critical when we want to provide real security via RSA and we don’t want to wait 10 minutes to decrypt the message.

PyRSA now available.

Nearest Neighbor Classifier March 23rd, 2005


The Nearest Neighbor Classifier, while robust and capable of handling streaming data, is sensitive to outlying data points and to irrelevant features. One critical part of designing a good nearest neighbor classifier is deciding which feature set to use in classifying new data points. One great way to choose the correct subset is through search. Exhaustive search isn’t realistic, though, for data sets with large numbers of features as the number of possible subsets of features is exponential: n = 2^F where F is the number of features in the data.

The purpose of this program is to search through the space of possible subsets of features in a faster way, that is, polynomial or better, without sacrificing too much accuracy in the classification. The first two methods we use are fairly straightforward: Forward Selection, and Backward Elimination. The former method begins with the empty set of features and adds one feature at a time while the latter method begins with all the features and removes one feature at a time.

The third, original, method to search for a good subset of features requires some explanation. By relaxing our criteria for what constitutes the “nearest neighbor”, we are able to avoid some of the calculations that make searching this space expensive. In other words, we sacrifice some accuracy of the classifier in order to gain a great deal of speed in computing the subset. The algorithm works as follows:

  1. All of the data is normalized so that every feature’s value falls between 0 and 1.
  2. The user is prompted to enter a value I call “delta” to be used during the nearest-neighbor computation. To be accurate, it is no longer the “nearest” neighbor in the set that we are interested in, only a “pretty good” one. So the modified nearest neighbor selector returns the first data point that falls within delta units distance from the given point.
  3. I run the Forward Selection Search using the modified nearest neighbor algorithm to return a “pretty good” subset of features.

Here I’ve omitted a sample trace of the program because it is lengthy, but you could download the source and run it in the Python Interpreter just as easiliy.

Running these three algorithms on various data sets yielded the following statistics:

Large Data Set – 1000 points, 30 Features
  Best Set Acc. % Time (s)
Forward {7, 9, 12} 87.40 14255.89
Backward {7, 9, 17, 25} 93.00 20507.24
Special( 1 ) {0, 1, 4, 7, 9, 10, 14, 15, 16, 17, 18, 19, 20, 27, 28} 71.60 4909.35
Special( 2 ) {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28} 69.60 647.04
Small Data Set – 600 points, 16 Features
  Best Set Acc. % Time (s)
Forward {2, 4, 9} 89.66 1120.73
Backward {2, 4} 88.17 1519.89
Special(.25) {2, 4, 5, 11, 12, 13, 14} 81.66 449.00
Special( .5 ) {1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14} 78.83 221.63
Special( 1 ) {0, 1, 2, 4, 5, 6, 7, 9, 10, 13, 14} 73.66 47.09
Special( 2 ) {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} 75.33 8.27
Small Special Set – 1000 points, 19 Features
  Best Set Acc. % Time (s)
Forward {0, 1, 2, 3} 86.60 2205.22
Backward {0, 1, 2, 3} 86.60 3806.77
Special(.25) {0, 1, 2, 3, 5, 7, 13, 14, 15} 67.20 1025.89
Special( .5 ) {0, 1, 4, 8, 12, 14, 16} 61.10 590.12
Special( 1 ) {0, 1, 2, 3, 4, 7, 8, 10, 11, 12, 14, 15, 16, 17} 61.10 128.71
Special( 2 ) {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 13, 14, 15, 16, 17, 18} 60.00 14.13


It is worth noting that running Special Search with delta = 0 is equivalent to running Forward Selection. As delta increases beyond a certain threshold (data set dependent), Special Search degenerates rapidly. This is because for every data point in the set, we will simply choose the next point we look at as the nearest neighbor. With two classes of points, this method will essentially yield 50% accuracy. Hence, it is possible to wind up with a feature set that is less accurate than “leave-one-out” evaluation with all the features.

Therefore, the selection of delta is important to achieve good performance in Special Search. There are some things we can say, quantitatively, about the selection of delta. First, it must by definition fall between 0 and F where F is the number of features in the data set. The lower bound is set because two points cannot be closer than 0 units in Euclidean space. The upper bound comes from the fact that when the data are normalized to values between 0 and 1, each point can differ by no more than 1 unit per feature (the reason it is not is because the distance function I am using doesn’t compute the square root part to save time).

It is apparent that the “proper” choice for delta will vary greatly according to the data set. If the data had great separation a large delta value would suffice and lead to faster computing time. Conversely, a data set with tightly grouped instances of opposite classes will require a smaller delta value to preserve a high degree of accuracy.

It is possible to do some simple preprocessing of the data before we begin the search in order to give us some idea of the data points’ separation. In fact, I used the following strategy to give me a ballpark figure for a reasonable delta value: First, I selected an arbitrary number of points at random from the set. For each point, I then recorded the class of that point, the distance to each other point in the set, and the class of each other point. I then sorted these data by class and ascending distance.

From this data, I got an idea for what value, on average, would allow us to correctly classify points without searching every point for the nearest neighbor. For the above example, a delta value of 1.75 would not sacrifice any accuracy because the closest point of opposite class was at a distance of 1.757 units. It must be mentioned though, that randomly sampling the data in the above way will not give us perfect data to use for the delta choice, only a subjective “feel” for the data’s separation.

It is up to the user to weigh the accuracy of the classifier versus the speed at which it classifies new points. Some data sets will be very conducive to Special Search while others will perform very poorly.

8-Puzzle Solver February 19th, 2005

Taking the main driver for the “4 Knights” problem I wrote a program that solves the “8 Puzzle” game using three ways: Uniform Cost Search, A* using the misplaced tiles heuristic, and A* using the Manhattan Distance heuristic. Here’s the source code if you’re interested.

A-Star (A*) Algorithm in Python February 2nd, 2005

Update: 25-Jan-2010

Since there have been many requests over the years for the source code referenced in this post, I decided to share it. A cautionary note to undergrad CS students (who I can only assume are the requestors): CS professors are pretty good at catching cheaters, so learn from others’ code, but write your own.


4 Knights Problem

4 KnightsWe start with two white knights and two black knights in the following configuration.

The goal is to move the knights so that the white knights and black knights effectively swap places.

Assuming we know nothing about the solution to this problem, the A-Star Algorithm is a good choice to search for the solution.

With no heuristic function or check for previously visited states, A* degenerates to uniform cost search. This is not an efficient method, especially in this particular domain. Consider: for every turn, one legal move will be the reverse of the last move made. The branching factor for this particular problem is greatly increased by this feature of the domain, and can thus be reduced by the same factor if we account for previously visited nodes.

It wouldn’t be A*, though, if we didn’t have a heuristic function to help guide our search. At first glance, misplaced knights may seem to be a good choice for a heuristic. For each knight that is not “in place”, we add one unit to the estimated distance (cost) to the goal. The function looks like this:

def distance(node):
  distance = 0
  if (node.contents[1] == "black"): distance += 1
  if (node.contents[6] == "black"): distance += 1
  if (node.contents[5] == "white"): distance += 1
  if (node.contents[7] == "white"): distance += 1
  return distance

Adding the above code into cost of each node does give us some improvement in time and space complexity, as we’ll see below. But the heuristic can still be improved.

In the previous heuristic, a knight “out of place” added at most one unit to the estimated cost. It is the case that a knight out of place is at least one move away from its goal square.

The graph to the left is another way of representing our 10 Square chessboard. Each node represents the corresponding square, and each node’s neighbors are the squares that are one move away. By examining the graph we can determine the minimum number of moves from each square to the appropriate goal square. The new heuristic will look like this:

def distance(node):
  distance = 0
  for square in node.contents.keys():
    if node.contents[square] != None:
      # no "switch" statement in python
      if square == 1:
        if node.contents[square] == "black":
          distance += 0
          distance += 1
      elif square == 2:
        if node.contents[square] == "black":
          distance += 3
          distance += 4        
      elif square == 10:
        if node.contents[square] == "black":
          distance += 2
          distance += 3
  return distance

Time and Space

Schema Nodes Searched Execution Time
Uniform Cost, no check for repeated states 46945… 18.5333 hours when I terminated the program
Uniform Cost, check repeated states 1204 14.42s
A-Star, misplaced knights 1112 12.77s
A-Star, minimum distance to goal 868 11.25s

It seems that the major savings achieved with the second heuristic is in space rather than time. While we reduce the search space by 22%, the cost of computing the new distance is substantially more, giving us negligible savings in time.

Check out the full source for the A* Algorithm in Python or let me know if you have any ideas for better heuristics.