Project: Gaining Familiarity with Python

Section 1.2 Project: Gaining Familiarity with Python

Objectives

As a gentle introduction to working with Python, we will implement an easily-teachable algorithm called Kid RSA. The Kid RSA algorithm serves as an introduction to the idea of public key cryptography, which will be investigated further in Chapter 3. It is important that you complete each Checkpoint project in order; it is a safe assumption that as you work through the checkpoints you are adding to the same file or files.

If you are not already familiar with programming and with Python, now would be a very good time to read the Python crash course appendix.

Create properly named and properly formatted files in Activity 1.1
Calculate your Kid RSA keypair in Activity 1.2
Learn how to encode and decode between typed strings and lists of integers in Activity 1.3
Combine all of the work to create a KidRSA encryption program in Activity 1.4
Write a simple function to calculate the greatest common divisor of two integers in Activity 1.5
Engage in mathematically hacking a KidRSA key in Activity 1.6

A first task for any of the projects will be to set up a good environment to work in, choosing an appropriate working directory and filenames for the project.

Activity 1.1. KidRSA project files.

(a)

Using IDLE, create a blank Python module file named aam_proj1.py in an easy-to-find folder on your computer. This will eventually be submitted via Canvas or some other digital method for grading.

(b)

As header information, every submitted project file should begin with a block of comments. Your header should look like the following example, with the information changed to reflect your information and situation.

# Math 3380 Project 1
# Filename: aam_proj1.py
#
# Name: Frodo Baggins
# Date: 25 March, T.A. 3019
#

Listing 1.2.1. A sample header for Math 3380 Project Files

All files submitted for these projects should have a similar header.

Public key cryptography first requires the generation of a keypair, consisting of a public key and a private key.

Algorithm 1.2.2. Kid RSA Key Generation.

Let \(a, b, a', b'\in\Zp\) be arbitrary positive integers. Calculate the following:

\begin{align*} M \amp = a b-1\\ e \amp = a'M + a\\ d \amp = b'M + b\\ n \amp = (ed-1)/M \\ \amp = a'b'M+ab'+a'b+1 \end{align*}

Then the public key is the ordered pair \((n,e)\) and the private key is the ordered pair \((n,d)\text{.}\)

Now use the algorithm to generate your own Kid RSA keypair.

Activity 1.2. Using Python as a simple calculator.

(a)

In the file created in Activity 1.1, declare values for a, b, a_prime, and b_prime. The variable names in Python cannot include the prime symbol, hence the change of name.

Make sure you include comments in your code explaining what you’re doing. Comments are lines (or ends of lines) which begin with the symbol called either “pound” or “hash”, specifically #.

(b)

Calculate the values of M, e, d, and n from a, b, a_prime, and b_prime using Python, as explained in Kid RSA Key Generation. Make sure your program displays the public and private key using statements such as print(f"Public key: {(n,e)}").

(c)

Prove mathematically that there is a value \(k\in\Ints\) such that \(ed = nk+1\text{,}\) and verify that your code works by printing the result of (e * d) % n.

(d)

Write a function which takes a, b, a_prime, and b_prime as input and returns the key values (n,e) and (n,d).

def kidrsa_keygen(a, b, a_prime, b_prime):
    """A function generating KidRSA keypairs."""
    #
    # Fill in the code to calculate n, e, d
    #
    return (n,e,d)

The fundamental idea behind public key cryptography is that given a public key it should be easy to mathematically compute the encrypted version of a message. However, without the private key it should ideally be impossible to compute the original message from the encrypted version. Unfortunately this is precisely why the Kid RSA algorithm is suitable only for teaching purposes: it is very easy to produce the private key \(d\) with a little arithmetic from the numbers \(n\) and \(e\) in the public key.

Axiom 1.2.3. The Division Algorithm.

Given \(a,b\in\Ints\) with \(b>0\text{,}\) there is a unique pair \(q,r\in\Ints\) with \(0\leq r \lt b\) such that \(a = bq+r\text{.}\) The number \(q\) is the integer quotient of \(a\) divided by \(b\text{,}\) while \(r\) is the remainder.

We can denote the quotient by

\begin{equation*} q = \floor{a/b} = \max\set{k\in\Ints:bk \leq a}\text{,} \end{equation*}

but there is no notation for the remainder which is universally agreed upon by mathematicians.

Technology 1.2.4. Python `divmod` built-in function.

To compute the quotient and remainder of integer division separately, use // and %; to compute the tuple of the quotient and remainder simultaneously, use divmod.

a, b = 8675309, 512
q, r = divmod(a, b)
print(a == b*q+r)

Listing 1.2.5. An example demonstrating the use of divmod(...) in Python

Definition 1.2.6. Division.

If \(a,b\in\Ints\text{,}\) we say that \(b\) divides \(a\) and write \(b\divides a\) if and only if there is \(q\in\Ints\) such that \(a=bq\text{.}\) That is to say, the remainder when \(a\) is divided by \(b\) is zero.

Definition 1.2.7. Kid RSA encryption/decryption function.

Let \((n,e)\) and \((n,d)\) respectively be the public and private keys generated by the Kid RSA key generation algorithm. Suppose \(x\in\Nats\text{,}\) then let \(q_e,r_e,q_d,r_d\in\Ints\) be the unique results of the Division Algorithm such that

\begin{align*} n \amp = (x\cdot e)\cdot q_e + r_e\\ n \amp = (x\cdot d)\cdot q_d + r_d\text{.} \end{align*}

Then the functions \(f_{n,e}(x) = r_e\) and \(g_{n,d}(x) = r_d\) are respectively the encryption and decryption functions for Kid RSA.

Activity 1.3. Encoding strings into lists of integers.

Encryption was used historically by military commanders seeking to send orders to remote troops, so permutation-based encryption on the letters of the alphabet was the ideal method of encryption. The invention of computers changed that by making it much more feasible to apply brute-force solutions to code breaking. Now that we use digital encryption it is necessary to convert the messages we want to send from strings of characters to lists of integers in order to perform mathematical functions on those integers.

In Python, the paired commands ord and chr convert characters to integers and back.

out = [ ]
for c in "Hello world!":
    out.append( ord(c) )
print(out)
for k in out:
    print( f"chr({k:>3}) -> {chr(k)}")

Listing 1.2.8. An example using the Python ord command

Every printable (or type-able) character has an ordinal, which is what is returned by ord; the valid ordinals each map to a character, but not every int is valid input for chr. The valid input range of chr is all values in range(0x110000); in turn, 0x110000 is the hexadecimal (base-16) number with base-10 value 1114112.

(a)

Find the ordinals of the letters A and a

(b)

Use ordinals along with a range(26) to print the capital and miniscule letters, in the format Aa through Zz.

To make a notational distinction, we will separate the ideas of encoding and encrypting. Encoding is the process by which a natural written language is transferred into a different symbol set, usually for the purpose of transmission. Encryption, on the other hand, is the process by which information is hidden from unintended recipients. Sometimes a simple encoding is used as encryption, but there are relatively easy mathematical tools to decode messages which have been encoded but not encrypted. Some of the same tools are useful to break low-security encryption, as well!

(c)

Fill in the remainder of the following code to produce a working program which takes a string as input and returns the list of ordinals of the characters in the string.

def encode(input_string):
    out = [ ]
    for c in input_string:
        #
        # Fill in your code here
        #
    return out

(d)

Fill in the remainder of the following code to produce a working program which takes a list of integers output by the ord command and returns a string of the characters with those ordinals.

def decode(input_list):
    out = ""
    for x in input_list:
        #
        # Fill in your code here
        #
    return out

You now have most of the pieces put together to encrypt messages using KidRSA.

Activity 1.4. Applying KidRSA.

Now it is time to combine all this work and write our first encryption program! KidRSA is a symmetric encryption algorithm where the same function is used with different inputs to encrypt and decrypt.

(a)

Write the function which applies KidRSA to a single integer.

def kidrsa_single(key, msg_int):
    "Apply KidRSA using key = (n,x) to the integer msg_int."
    n, x = key
    return (x*msg_int) % n

(b)

Now, combine this with encode and decode to encrypt and decrypt a msg_str! Correct the following function and include it in your file.

def kidrsa(key, msg):
    """Apply KidRSA using key = (n,x) to encrypt or decrypt
        the message msg, which is either a string to be encrypted
        or a list of integers to be decrypted."""
    n,x = key
    if type(msg) == str:
        # We were given a string, so we must be trying to encrypt,
        # so first we have to encode the string.
        msg_list = [ 65 ] ########### FIX THIS ###########
    elif type(msg) == list:
        # We were given a list, so we must be trying to decrypt,
        # so let's just make msg_list a copy of msg.
        msg_list = msg[:]
    # Now we can apply the kidRSA_single function to everything
    # in msg_list.
    out_list = [ ]
    for number in msg_list:
        out_list.append( 65 ) ########### FIX THIS ###########
    if type(msg) == list:
        # We must need to decode whatever we've decrypted.
        return "A"  ########### FIX THIS ###########
    else:
        # Otherwise we want to return the list of encrypted
        # integers without trying to decode it.
        return [65] ########### FIX THIS ###########

Listing 1.2.9. Applying KidRSA to messages

(c)

Test your work! Use a, b, a_prime, b_prime = 4, 81, 123, 19963 to generate a keypair and then use that to decrypt the following:

[2900509, 4609028, 1549587, 4569295, 1271456, 3854101, 1271456,
3973300, 3854101, 4370630, 4092499, 4013033, 4529562, 4410363,
4648761, 4569295, 1271456, 3893834, 4648761, 4569295, 4171965,
4370630, 4013033, 4569295, 4569295, 1748252, 1271456, 2781310,
4529562, 4410363, 3973300, 4410363, 1748252, 1271456, 4092499,
4410363, 4171965, 4370630, 4092499, 1271456, 4410363, 4648761,
4609028, 1271456, 4807693, 4410363, 4648761, 4529562, 1271456,
3973300, 4410363, 4410363, 4529562, 1827718]

The next part of the project will be to undo all this hard work by devising a method to find the private part of a KidRSA key pair when given the public part.

Definition 1.2.10. Greatest Common Divisor.

Suppose \(a,b\in\Ints\text{.}\) Then \(k\) is a common divisor of \(a\) and \(b\) if and only if \(k\divides a\) and \(k\divides b\text{.}\) The integer \(\gcd(a,b)\) is the greatest common divisor of \(a\) and \(b\text{,}\) given by

\begin{equation*} \gcd(a,b) = \max\set{k\in\Ints:k\divides a\text{ and }k\divides b}\text{.} \end{equation*}

Equipped with these definitions, we can prove an important early result from number theory.

Theorem 1.2.11. Greatest common divisors with remainders.

Let \(a,b\in\Ints\) with \(b > 0\text{,}\) and let \(q,r\in\Ints\) with \(0\leq r\lt b\) be the result of the Division Algorithm; namely, suppose \(a = bq+r\text{.}\) Then \(\gcd(a,b) = \gcd(b,r)\text{.}\)

Proof.

Let \(a\text{,}\) \(b\text{,}\) \(q\text{,}\) and \(r\) be as hypothesized, and let \(g=\gcd(a,b)\text{.}\) Then \(g\divides a\) and \(g\divides b\text{,}\) so there are \(k,\ell\in\Ints\) such that \(a=gk\) and \(b=g\ell\text{.}\) But then since

\begin{equation*} r=a-bq = gk-(g\ell)q = g(k-\ell q)\text{,} \end{equation*}

we have that \(g\) divides \(r\) as well. So \(g\) is a common divisor of \(b\) and \(r\text{.}\)

Now assume that there is a common divisor \(h\) of \(b\) and \(r\) satisfying \(g\lt h\text{.}\) Then there are other integers \(k',\ell'\) such that \(hk' = b\) and \(h\ell' = r\text{.}\) But then

\begin{equation*} a = bq+r = (hk')q+h\ell' = h(k'q + \ell')\text{,} \end{equation*}

making \(h\) a divisor of \(a\) as well; This contradicts our choice that \(g\) was the greatest common divisor of \(a\) and \(b\text{,}\) and hence we see \(g = \gcd(b,r)\text{.}\)

If we repeatedly apply this theorem to a pair of integers and the successive results of the division algorithm, the final nonzero remainder must actually be the greatest common divisor of the two initial integers!

Algorithm 1.2.12. Euclid’s Algorithm.

Let \(a,b\in\Zp\) be given. By repeatedly applying The Division Algorithm to the pair \(a,b\text{,}\) produce the sequences \((q_1,q_2,\dotsc,q_k)\) and \((r_1,r_2,\dotsc,r_k)\) where \(r_k=0\text{,}\) such that

\begin{align*} a \amp= b \cdot q_1 + r_1\\ b \amp= r_1 \cdot q_2 + r_2\\ r_1 \amp= r_2 \cdot q_3 + r_3\\ \end{align*}

and so on until

\begin{align*} r_{k-3} \amp= r_{k-2} \cdot q_{k-1} + r_{k-1}\\ r_{k-2} \amp= r_{k-1} \cdot q_k + r_k \\ &= r_{k-1}\cdot q_k\text{.} \end{align*}

Then \(\gcd(a,b) = r_{k-1}\text{.}\)

Proof.

By repeated application of Greatest common divisors with remainders we know that \(\gcd(a,b) = \gcd(r_{k-2},r_{k-1})\text{,}\) but we also see that \(r_{k-1}\divides{} r_{k-2}\text{,}\) so the greatest common divisor of \(r_{k-1}\) and \(r_{k-2}\) is \(r_{k-1}\) itself.

Activity 1.5. Implementing Euclid’s algorithm.

In order to find the greatest common divisor, we apply the division algorithm repeatedly in the form of Python’s divmod(...) command, stopping when the remainder it produces is zero. Once the remainder is 0, we know that the previous remainder was the \(\gcd{}\text{.}\) This is a great use case for a while loop. First define a function in the same file created in Activity 1.1

(a)

Create the following function:

def gcd(left, right):
    """An iterative greatest common divisor function."""
    # Set the initial value of a and b to the input
    a, b = left, right
    # Calculate the quotient and remainder
    q, r = divmod(a, b)
    while r != 0:
        #
        # Leave this blank for now
        #
    # Return the correct final value by changing None
    return None

As written, if you execute gcd(123,457) the program will crash with an IndentationError. This is a very specific type of syntax error which occurs when a colon (:) has been left dangling at the opening of a block structure, but the next line was not indented to start the block!

(b)

Inspecting Euclid’s Algorithm shows you what needs to change inside the loop: we need to update the values of a and b and then recompute the values of q and r. Change the contents of the loop so that the correct new values are assigned to a and b, and include another q, r = divmod(a, b) line at the end of the loop.

(c)

Once you reach this point, your loop will correctly end. Now change the return None line so that the correct number is returned, instead of the do-nothing value None.

(d)

Verify your work! Add the line print( gcd(123321, 345345543) ) to the end of your file at the leftmost indendation level and then run it. You should obtain an answer of 3.

We need another theorem if we want to develop a tool to break a keypair.

Theorem 1.2.13. Greatest common divisor is a linear combination.

Let \(a,b\in\Zp\text{.}\) Then there are \(s,t\in\Ints\) such that \(sa+tb = \gcd(a,b)\text{.}\)

Proof.

The proof is actually an extension of Euclid’s Algorithm. We will again apply the Division Algorithm, but at each step we will rewrite the equation to solve for the remainder. In the first iteration, we see the following:

\begin{equation*} a = b\cdot q_1 + r_1 \implies r_1 = a - q_1 b\text{.} \end{equation*}

Hence we define \(s_1 = 1\) and \(t_1 = -q_1\text{,}\) so that

\begin{equation*} r_1 = s_1\cdot a + t_1\cdot b\text{.} \end{equation*}

In the second iteration, we find

\begin{equation*} b = r_1\cdot q_2 + r_2 \implies r_2 = b-q_2\cdot r_1\text{,} \end{equation*}

but we can replace \(r_1\) with the result of the previous iteration, giving

\begin{align*} r_2 \amp = b-q_2 \cdot r_1\\ \amp = b-q_2\left(s_1\cdot a + t_1\cdot b\right)\\ \amp = \left(0-q_2s_1\right)\cdot a + \left(1-q_2 t_1\right)\cdot b\\ \amp = s_2\cdot a + t_2\cdot b \end{align*}

In order to see the pattern fully emerge, the third iteration consists of

\begin{equation*} r_1 = r_2\cdot q_3 + r_3 \implies r_3 = r_1 - q_3\cdot r_2\text{,} \end{equation*}

and we can again replace the \(r_1\) and \(r_2\) with the results of the previous two computations. Thus we obtain

\begin{align*} r_3 \amp = r_1 - q_3\cdot r_2\\ \amp = (s_1\cdot a + t_1\cdot b) - q_3(s_2\cdot a + t_2\cdot b)\\ \amp = (s_1-q_3s_2)\cdot a + (t_1-q_3 t_2)\cdot b\\ \amp = s_3\cdot a + t_3\cdot b\text{.} \end{align*}

We have established enough iterations to suggest recursive formulas

\begin{equation*} s_n = s_{n-2} + q_n s_{n-1}\text{ and } t_n = t_{n-2} + q_n t_{n-1} \end{equation*}

for \(n\geq 3\text{,}\) where

\begin{align*} s_1 \amp= 1, \amp s_2\amp = -q_2s_1,\amp t_1\amp= -q_1\amp t_2\amp = 1-q_2t_1\text{.} \end{align*}

But then there is \(k\in\Zp\) such that \(\gcd(a,b) = r_{k-1} = s_{k-1}a + t_{k-1}b\text{.}\)

To apply this theorem to breaking the Kid RSA algorithm, we write an extended version of Euclid’s Algorithm and keep track of the list of computed values of \(t_k\) on each iteration. The values of \(s_k\) are unnecessary in our particular application of computing a reciprocal modulo n

Algorithm 1.2.14. Euclid’s Algorithm Extended.

Let \(\alpha, \beta \in \Zp\) be given with \(\alpha\gt \beta\text{.}\) Define the following initial conditions:

\begin{align*} \amp\amp\amp\amp s_0 \amp= 1 \amp t_0 \amp= 0\\ a_1 \amp= \alpha \amp b_1 \amp= \beta \amp s_1 \amp= 0 \amp t_1 \amp= 1 \end{align*}

For each \(k \geq 1\text{,}\) do the following:

Compute \(q_k, r_k\) by The Division Algorithm such that \(a_k = b_k\cdot q_k + r_k\text{.}\)
If \(r_k\neq 0\text{,}\) then:
1. Compute \(t_k = t_{k-2} - q_k\cdot t_{k-1}\text{.}\)
2. Let \(a_{k+1} = b_k\) and \(b_{k+1} = r_k\text{.}\)
On the other hand, when \(r_k = 0\text{,}\) then do the following:
1. The value \(r_{k-1}\) is the last nonzero remainder, so \(\gcd(\alpha,\beta) = r_{k-1}\text{.}\)
2. If \(r_{k-1} = 1\text{,}\) then \(\beta\cdot t_{k-1} = 1\pmod{\alpha}\text{.}\) Thus \(t_{k-1}\) is the reciprocal of \(\beta\) modulo \(\alpha\text{.}\)

Activity 1.6. Hacking KidRSA.

(a)

We can use Euclid’s Algorithm Extended to reverse-engineer the number \(d\) of a KidRSA triple \((n,e,d)\) if we know \((n,e)\text{,}\) because \(d\) is simply the reciprocal of \(e\) modulo \(n\text{.}\) Write a new function called big_gcd which modifies your previous gcd function by adding exactly two lines: t = [0,1] and t.append( t[-2] - q * t[-1] ). You’ll need to determine carefully where these lines must be added.

Finally, change the return statement so that it returns the greatest common divisor as well as t[-1] in an ordered pair.

(b)

Find d if my public KidRSA key is (n,e) where

n = 168702191480379745583323186370491914430362515412305925474315373302579709429
e = 169514405816166316269158771599355601156883726762227835237

Prev Top Next

Section 1.2 Project: Gaining Familiarity with Python

Objectives

Activity 1.1. KidRSA project files.

(a)

(b)

Algorithm 1.2.2. Kid RSA Key Generation.

Activity 1.2. Using Python as a simple calculator.

(a)

(b)

(c)

(d)

Axiom 1.2.3. The Division Algorithm.

Technology 1.2.4. Python divmod built-in function.

Definition 1.2.6. Division.

Definition 1.2.7. Kid RSA encryption/decryption function.

Activity 1.3. Encoding strings into lists of integers.

(a)

(b)

(c)

(d)

Activity 1.4. Applying KidRSA.

(a)

(b)

(c)

Definition 1.2.10. Greatest Common Divisor.

Theorem 1.2.11. Greatest common divisors with remainders.

Proof.

Algorithm 1.2.12. Euclid’s Algorithm.

Proof.

Activity 1.5. Implementing Euclid’s algorithm.

(a)

(b)

(c)

(d)

Theorem 1.2.13. Greatest common divisor is a linear combination.

Proof.

Algorithm 1.2.14. Euclid’s Algorithm Extended.

Activity 1.6. Hacking KidRSA.

(a)

(b)

Technology 1.2.4. Python `divmod` built-in function.