Notes on assignment 3

Computing modular powers

This only shows how the ^ operator for Modular numbers works. You do not need to write this.

modpow(x,n,m) produces xⁿ mod m.

Define case modpow(?,0,?) = 1 case modpow(?x,2*(?n),?m) = (p*p) _mod_ m | Let p = modpow(x,n,m). case modpow(?x,?n+1,?m) = x*modpow(x,n,m) %Define

Factoring and the security of RSA

The RSA system is a public key system. What that means is that the key required to encipher can be made public, keeping the decipher key private.

If the public key is (n,e), then the private key (n,d) can be found by anyone who has the prime factors of n. Since you created n by finding p and q and then making n = p*q, you know the prime factors of n. But n is part of the public information, so anyone else could get that information by running a factoring algorithm to factor n. The question then is, how hard is it to factor a number?

The answer depends on the size and nature of n, and on the algorithm that is used. Algorithms can be classified by the rough size of numbers that can be factored in a reasonable amount of time. Here, a reasonable time might be considered to be anything from a few seconds to a year.

Increases in the size of numbers that can be factored have been made over time, partly because computers have become faster, but much more significantly due to improvements in algorithms. Here are a few algorithms.

The naive algorithm tries all factors up to the square root of n. It can factor numbers up to about 18 digits in a reasonable amount of time. That would require testing a billion potential factors.
Pollard's algorithm runs much more efficiently than the naive algorithm. Its time is roughly proportional to the fourth root of n, instead of the square root of n. You might hope to factor 35 digit numbers with it. Function factors, from package math/prime.ast, uses Pollard's algorithm.
Variations of the Quadratic Sieve algorithm have been used to factor numbers in the 90-100 digit range, taking months of time on hundreds of computers running in parallel.
The Number Field algorithm has factored 125 digit numbers on a large network of computers.

At present, factoring 200 digit numbers is out of practical reach.

Other security issues

There are two serious security problems with this program.

Short pieces and frequently occurring text

After you break up a file into pieces for encryption, the last segment of a file can be shorter than the rest. That is a security loophole, since it might be possible to figure out what that last segment is. An extreme example is where the file just contains one of the words "yes" or "no". Anyone who has the pub.key file can encipher both of those words, and see what the enciphered version of each looks like.

In fact, any time you have phrases that are frequently repeated in documents, someone might be able to get some information about a document by searching for enciphered versions of those phrases.

There is a solution to this problem called salting. When you encipher a string, you increase the length of each section to encipher by adding some random characters. If the last section is shorter than the rest, then add more random characters to it, to make all of the sections have the same size. If you know the length of the original file and the number of salt characters that are normally added (part of the key files) then it is easy to remove the salt when you decipher a document.

Since the sections have become longer, you need to ensure that your n in the RSA system is large enough for the new, larger sections.

Sequencing and mixing documents

Even if someone cannot decipher documents, he or she might reorder the parts, or even mix the pieces of different documents together. For example, some unscrupulous person might get hold of two enciphered documents. He or she might take the first block of one followed by the second block of another, etc., and put together a new document. Since each individual block is correctly enciphered, the newly manufactured document will appear to be a correctly enciphered document. Or, this unscrupulous person might simple rearrange the pieces of a single document, producing a new document. That kind of thing should be disallowed.

This problem can be avoided by adding additional information to each piece before encryption. Select a short random string that identifies this document, to prevent mixing of documents. You can use this as your salt. Also add a small number of sequence bytes. Use sequence number 0 on the first part, 1 on the second part, etc. That prevents rearranging.

When you decipher a document, check that the random tags are the same (so they can from the same document) and that the blocks are in the correct sequence. You can also add information (underneath the encryption so that it cannot be altered) indicating how many blocks there are, so that all must be present for correct decryption.

Checking primality

One way to discover whether a number is prime is to find its prime factors. That is clearly unacceptable, however. We need to be able to check whether a very large number is prime in a reasonable amount of time, in spite of the fact that factoring large numbers if very difficult.

There is a very clever way to check whether a number is prime. A result due to Pierre de Fermat, called Fermat's Little Theorem, states that

If p is prime and x is an integer that is not divisible by p, then x^p-1 mod p = 1.

If you select a number x, where 0 < x < n, and you discover that x^n-1 mod n is not 1, then n must clearly not be prime, since otherwise Fermat's Little Theorem would be violated. For example, 2⁸ mod 9 = 4, so 9 cannot be prime.

Say that x is a witness to the fact that n is not prime if 0 < x < n and x^n-1 mod n is not 1. For example, 2 is a witness to the fact that 9 is not prime.

Remarkably, for all but rarely occurring numbers called Carmichael numbers, if n is not prime, then at least half of the numbers from 1 to n-1 are witnesses that n is not prime. To find out whether n is prime, guess a few numbers x at random. If any of them are witnesses that n is not prime, then you know for sure that n is not prime. If none of the guessed numbers are witnesses, then, with high probability, n is prime. (If it were not, you would probably have found a witness.) This approach is much faster than factoring for large numbers.

This does not work for the Carmichael numbers, but there is a separate test for them. (In fact, Carmichael numbers are always easy to factor.) Since Carmichael numbers are rare, the extra test for them is rarely needed in practice.

Function prime? in package math/prime.ast implements this algorithm, with the added test for Carmichael numbers, and so can be used to test whether very large numbers are prime.

The algorithm just described has the disadvantage that it can make errors. With low probability, you can fail to find a witness, even though one exists. In 2002 a new algorithm was discovered that efficiently tests whether a number is prime, without the possibility of ever making an error. Researchers had thought that an algorithm like this must exist, but had not found one earlier that they could prove was correct. Probably the most surprising thing about the new algorithm is how simple and short it turned out to be. Few people were expecting that. That is to the credit of the algorithm's discoverers.

In practice, the new algorithm is still considerably slower than the randomized one based on Fermat's Little Theorem, and the probability that the randomized algorithm makes an error can be made smaller than the probability that the computer malfunctions because it was hit by a cosmic ray. So the randomized algorithm is still used. There remains the possibility of a truly fast practical algorithm that never errs.

Practical encryption

RSA is a public key cipher. That means that you have two keys, one to encipher (the public key) and one to decipher (the private key). Knowing how to encipher does not help you determine how to decipher, within a reasonable amount of time.

The RSA system is versatile; it allows you to exchange information over a channel with an eaves dropper, even with someone whom you have never communicated with before. (You choose a random key, and send the public key to the other person. That person can send encrypted replies back. The eaves dropper will know the public key, but only you can read the encrypted documents because only you have the private key.)

Unfortunately, RSA is very slow. Practical systems like PGP use RSA during an initial "getting to know you" phase, during which keys are exchanged for a simpler and more efficient private-key system, such as the Advanced Encryption Standard. It is also possible to publish public keys for anyone to read. The remaining information is encrypted using the private-key system. Private key systems have just one key that is used for both enciphering and deciphering. If you tell someone how to encipher a document, you have also given away information on how to decipher.