Section1.2Project: Gaining Familiarity with Python
Objectives
As a gentle introduction to working with Python, we will implement an easily-teachable algorithm called Kid RSA. The Kid RSA algorithm serves as an introduction to the idea of public key cryptography, which will be investigated further in Chapter 3. It is important that you complete each Checkpoint project in order; it is a safe assumption that as you work through the checkpoints you are adding to the same file or files.
If you are not already familiar with programming and with Python, now would be a very good time to read the Python crash course appendix.
Create properly named and properly formatted files in Activity 1.1
Learn how to encode and decode between typed strings and lists of integers in Activity 1.3
Combine all of the work to create a KidRSA encryption program in Activity 1.4
Write a simple function to calculate the greatest common divisor of two integers in Activity 1.5
Engage in mathematically hacking a KidRSA key in Activity 1.6
A first task for any of the projects will be to set up a good environment to work in, choosing an appropriate working directory and filenames for the project.
Activity1.1.KidRSA project files.
(a)
Using IDLE, create a blank Python module file named aam_proj1.py in an easy-to-find folder on your computer. This will eventually be submitted via Canvas or some other digital method for grading.
(b)
As header information, every submitted project file should begin with a block of comments. Your header should look like the following example, with the information changed to reflect your information and situation.
Listing1.2.1.A sample header for Math 3380 Project Files
All files submitted for these projects should have a similar header.
Public key cryptography first requires the generation of a keypair, consisting of a public key and a private key.
Algorithm1.2.2.Kid RSA Key Generation.
Let \(a, b, a', b'\in\Zp\) be arbitrary positive integers. Calculate the following:
\begin{align*}
M \amp = a b-1\\
e \amp = a'M + a\\
d \amp = b'M + b\\
n \amp = (ed-1)/M \\
\amp = a'b'M+ab'+a'b+1
\end{align*}
Then the public key is the ordered pair \((n,e)\) and the private key is the ordered pair \((n,d)\text{.}\)
Now use the algorithm to generate your own Kid RSA keypair.
Activity1.2.Using Python as a simple calculator.
(a)
In the file created in Activity 1.1, declare values for a, b, a_prime, and b_prime. The variable names in Python cannot include the prime symbol, hence the change of name.
Make sure you include comments in your code explaining what you’re doing. Comments are lines (or ends of lines) which begin with the symbol called either “pound” or “hash”, specifically #.
(b)
Calculate the values of M, e, d, and n from a, b, a_prime, and b_prime using Python, as explained in Kid RSA Key Generation. Make sure your program displays the public and private key using statements such as print(f"Public key: {(n,e)}").
(c)
Prove mathematically that there is a value \(k\in\Ints\) such that \(ed = nk+1\text{,}\) and verify that your code works by printing the result of (e * d) % n.
(d)
Write a function which takes a, b, a_prime, and b_prime as input and returns the key values (n,e) and (n,d).
def kidrsa_keygen(a, b, a_prime, b_prime):
"""A function generating KidRSA keypairs."""
#
# Fill in the code to calculate n, e, d
#
return (n,e,d)
The fundamental idea behind public key cryptography is that given a public key it should be easy to mathematically compute the encrypted version of a message. However, without the private key it should ideally be impossible to compute the original message from the encrypted version. Unfortunately this is precisely why the Kid RSA algorithm is suitable only for teaching purposes: it is very easy to produce the private key \(d\) with a little arithmetic from the numbers \(n\) and \(e\) in the public key.
Axiom1.2.3.The Division Algorithm.
Given \(a,b\in\Ints\) with \(b>0\text{,}\) there is a unique pair \(q,r\in\Ints\) with \(0\leq r \lt b\) such that \(a = bq+r\text{.}\) The number \(q\) is the integer quotient of \(a\) divided by \(b\text{,}\) while \(r\) is the remainder.
but there is no notation for the remainder which is universally agreed upon by mathematicians.
Technology1.2.4.Python divmod built-in function.
To compute the quotient and remainder of integer division separately, use // and %; to compute the tuple of the quotient and remainder simultaneously, use divmod.
a, b = 8675309, 512
q, r = divmod(a, b)
print(a == b*q+r)
Listing1.2.5.An example demonstrating the use of divmod(...) in Python
Definition1.2.6.Division.
If \(a,b\in\Ints\text{,}\) we say that \(b\) divides \(a\) and write \(b\divides a\) if and only if there is \(q\in\Ints\) such that \(a=bq\text{.}\) That is to say, the remainder when \(a\) is divided by \(b\) is zero.
Let \((n,e)\) and \((n,d)\) respectively be the public and private keys generated by the Kid RSA key generation algorithm. Suppose \(x\in\Nats\text{,}\) then let \(q_e,r_e,q_d,r_d\in\Ints\) be the unique results of the Division Algorithm such that
\begin{align*}
n \amp = (x\cdot e)\cdot q_e + r_e\\
n \amp = (x\cdot d)\cdot q_d + r_d\text{.}
\end{align*}
Then the functions \(f_{n,e}(x) = r_e\) and \(g_{n,d}(x) = r_d\) are respectively the encryption and decryption functions for Kid RSA.
Activity1.3.Encoding strings into lists of integers.
Encryption was used historically by military commanders seeking to send orders to remote troops, so permutation-based encryption on the letters of the alphabet was the ideal method of encryption. The invention of computers changed that by making it much more feasible to apply brute-force solutions to code breaking. Now that we use digital encryption it is necessary to convert the messages we want to send from strings of characters to lists of integers in order to perform mathematical functions on those integers.
In Python, the paired commands ord and chr convert characters to integers and back.
out = [ ]
for c in "Hello world!":
out.append( ord(c) )
print(out)
for k in out:
print( f"chr({k:>3}) -> {chr(k)}")
Listing1.2.8.An example using the Python ord command
Every printable (or type-able) character has an ordinal, which is what is returned by ord; the valid ordinals each map to a character, but not every int is valid input for chr. The valid input range of chr is all values in range(0x110000); in turn, 0x110000 is the hexadecimal (base-16) number with base-10 value 1114112.
(a)
Find the ordinals of the letters A and a
(b)
Use ordinals along with a range(26) to print the capital and miniscule letters, in the format Aa through Zz.
To make a notational distinction, we will separate the ideas of encoding and encrypting. Encoding is the process by which a natural written language is transferred into a different symbol set, usually for the purpose of transmission. Encryption, on the other hand, is the process by which information is hidden from unintended recipients. Sometimes a simple encoding is used as encryption, but there are relatively easy mathematical tools to decode messages which have been encoded but not encrypted. Some of the same tools are useful to break low-security encryption, as well!
(c)
Fill in the remainder of the following code to produce a working program which takes a string as input and returns the list of ordinals of the characters in the string.
def encode(input_string):
out = [ ]
for c in input_string:
#
# Fill in your code here
#
return out
(d)
Fill in the remainder of the following code to produce a working program which takes a list of integers output by the ord command and returns a string of the characters with those ordinals.
def decode(input_list):
out = ""
for x in input_list:
#
# Fill in your code here
#
return out
You now have most of the pieces put together to encrypt messages using KidRSA.
Activity1.4.Applying KidRSA.
Now it is time to combine all this work and write our first encryption program! KidRSA is a symmetric encryption algorithm where the same function is used with different inputs to encrypt and decrypt.
(a)
Write the function which applies KidRSA to a single integer.
def kidrsa_single(key, msg_int):
"Apply KidRSA using key = (n,x) to the integer msg_int."
n, x = key
return (x*msg_int) % n
(b)
Now, combine this with encode and decode to encrypt and decrypt a msg_str! Correct the following function and include it in your file.
def kidrsa(key, msg):
"""Apply KidRSA using key = (n,x) to encrypt or decrypt
the message msg, which is either a string to be encrypted
or a list of integers to be decrypted."""
n,x = key
if type(msg) == str:
# We were given a string, so we must be trying to encrypt,
# so first we have to encode the string.
msg_list = [ 65 ] ########### FIX THIS ###########
elif type(msg) == list:
# We were given a list, so we must be trying to decrypt,
# so let's just make msg_list a copy of msg.
msg_list = msg[:]
# Now we can apply the kidRSA_single function to everything
# in msg_list.
out_list = [ ]
for number in msg_list:
out_list.append( 65 ) ########### FIX THIS ###########
if type(msg) == list:
# We must need to decode whatever we've decrypted.
return "A" ########### FIX THIS ###########
else:
# Otherwise we want to return the list of encrypted
# integers without trying to decode it.
return [65] ########### FIX THIS ###########
Listing1.2.9.Applying KidRSA to messages
(c)
Test your work! Use a, b, a_prime, b_prime = 4, 81, 123, 19963 to generate a keypair and then use that to decrypt the following:
The next part of the project will be to undo all this hard work by devising a method to find the private part of a KidRSA key pair when given the public part.
Definition1.2.10.Greatest Common Divisor.
Suppose \(a,b\in\Ints\text{.}\) Then \(k\) is a common divisor of \(a\) and \(b\) if and only if \(k\divides a\) and \(k\divides b\text{.}\) The integer \(\gcd(a,b)\) is the greatest common divisor of \(a\) and \(b\text{,}\) given by
\begin{equation*}
\gcd(a,b) = \max\set{k\in\Ints:k\divides a\text{ and }k\divides b}\text{.}
\end{equation*}
Equipped with these definitions, we can prove an important early result from number theory.
Theorem1.2.11.Greatest common divisors with remainders.
Let \(a,b\in\Ints\) with \(b > 0\text{,}\) and let \(q,r\in\Ints\) with \(0\leq r\lt b\) be the result of the Division Algorithm; namely, suppose \(a = bq+r\text{.}\) Then \(\gcd(a,b) = \gcd(b,r)\text{.}\)
Proof.
Let \(a\text{,}\)\(b\text{,}\)\(q\text{,}\) and \(r\) be as hypothesized, and let \(g=\gcd(a,b)\text{.}\) Then \(g\divides a\) and \(g\divides b\text{,}\) so there are \(k,\ell\in\Ints\) such that \(a=gk\) and \(b=g\ell\text{.}\) But then since
we have that \(g\) divides \(r\) as well. So \(g\) is a common divisor of \(b\) and \(r\text{.}\)
Now assume that there is a common divisor \(h\) of \(b\) and \(r\) satisfying \(g\lt h\text{.}\) Then there are other integers \(k',\ell'\) such that \(hk' = b\) and \(h\ell' = r\text{.}\) But then
making \(h\) a divisor of \(a\) as well; This contradicts our choice that \(g\) was the greatest common divisor of \(a\) and \(b\text{,}\) and hence we see \(g = \gcd(b,r)\text{.}\)
If we repeatedly apply this theorem to a pair of integers and the successive results of the division algorithm, the final nonzero remainder must actually be the greatest common divisor of the two initial integers!
Algorithm1.2.12.Euclid’s Algorithm.
Let \(a,b\in\Zp\) be given. By repeatedly applying The Division Algorithm to the pair \(a,b\text{,}\) produce the sequences \((q_1,q_2,\dotsc,q_k)\) and \((r_1,r_2,\dotsc,r_k)\) where \(r_k=0\text{,}\) such that
\begin{align*}
a \amp= b \cdot q_1 + r_1\\
b \amp= r_1 \cdot q_2 + r_2\\
r_1 \amp= r_2 \cdot q_3 + r_3\\
\end{align*}
By repeated application of Greatest common divisors with remainders we know that \(\gcd(a,b) = \gcd(r_{k-2},r_{k-1})\text{,}\) but we also see that \(r_{k-1}\divides{} r_{k-2}\text{,}\) so the greatest common divisor of \(r_{k-1}\) and \(r_{k-2}\) is \(r_{k-1}\) itself.
Activity1.5.Implementing Euclid’s algorithm.
In order to find the greatest common divisor, we apply the division algorithm repeatedly in the form of Python’s divmod(...) command, stopping when the remainder it produces is zero. Once the remainder is 0, we know that the previous remainder was the \(\gcd{}\text{.}\) This is a great use case for a while loop. First define a function in the same file created in Activity 1.1
(a)
Create the following function:
def gcd(left, right):
"""An iterative greatest common divisor function."""
# Set the initial value of a and b to the input
a, b = left, right
# Calculate the quotient and remainder
q, r = divmod(a, b)
while r != 0:
#
# Leave this blank for now
#
# Return the correct final value by changing None
return None
As written, if you execute gcd(123,457) the program will crash with an IndentationError. This is a very specific type of syntax error which occurs when a colon (:) has been left dangling at the opening of a block structure, but the next line was not indented to start the block!
(b)
Inspecting Euclid’s Algorithm shows you what needs to change inside the loop: we need to update the values of a and b and then recompute the values of q and r. Change the contents of the loop so that the correct new values are assigned to a and b, and include another q, r = divmod(a, b) line at the end of the loop.
(c)
Once you reach this point, your loop will correctly end. Now change the return None line so that the correct number is returned, instead of the do-nothing value None.
(d)
Verify your work! Add the line print( gcd(123321, 345345543) ) to the end of your file at the leftmost indendation level and then run it. You should obtain an answer of 3.
We need another theorem if we want to develop a tool to break a keypair.
Theorem1.2.13.Greatest common divisor is a linear combination.
Let \(a,b\in\Zp\text{.}\) Then there are \(s,t\in\Ints\) such that \(sa+tb = \gcd(a,b)\text{.}\)
Proof.
The proof is actually an extension of Euclid’s Algorithm. We will again apply the Division Algorithm, but at each step we will rewrite the equation to solve for the remainder. In the first iteration, we see the following:
\begin{equation*}
a = b\cdot q_1 + r_1 \implies r_1 = a - q_1 b\text{.}
\end{equation*}
Hence we define \(s_1 = 1\) and \(t_1 = -q_1\text{,}\) so that
\begin{equation*}
r_1 = s_1\cdot a + t_1\cdot b\text{.}
\end{equation*}
But then there is \(k\in\Zp\) such that \(\gcd(a,b) = r_{k-1} = s_{k-1}a + t_{k-1}b\text{.}\)
To apply this theorem to breaking the Kid RSA algorithm, we write an extended version of Euclid’s Algorithm and keep track of the list of computed values of \(t_k\) on each iteration. The values of \(s_k\) are unnecessary in our particular application of computing a reciprocal modulo n
Algorithm1.2.14.Euclid’s Algorithm Extended.
Let \(\alpha, \beta \in \Zp\) be given with \(\alpha\gt \beta\text{.}\) Define the following initial conditions:
Let \(a_{k+1} = b_k\) and \(b_{k+1} = r_k\text{.}\)
On the other hand, when \(r_k = 0\text{,}\) then do the following:
The value \(r_{k-1}\) is the last nonzero remainder, so \(\gcd(\alpha,\beta) = r_{k-1}\text{.}\)
If \(r_{k-1} = 1\text{,}\) then \(\beta\cdot t_{k-1} = 1\pmod{\alpha}\text{.}\) Thus \(t_{k-1}\) is the reciprocal of \(\beta\) modulo \(\alpha\text{.}\)
Activity1.6.Hacking KidRSA.
(a)
We can use Euclid’s Algorithm Extended to reverse-engineer the number \(d\) of a KidRSA triple \((n,e,d)\) if we know \((n,e)\text{,}\) because \(d\) is simply the reciprocal of \(e\) modulo \(n\text{.}\) Write a new function called big_gcd which modifies your previous gcd function by adding exactly two lines: t = [0,1] and t.append( t[-2] - q * t[-1] ). You’ll need to determine carefully where these lines must be added.
Finally, change the return statement so that it returns the greatest common divisor as well as t[-1] in an ordered pair.
(b)
Find d if my public KidRSA key is (n,e) where
n = 168702191480379745583323186370491914430362515412305925474315373302579709429
e = 169514405816166316269158771599355601156883726762227835237