One amusement among crypto hackers is to try to write the shortest possible encryption program. Recently numerous people have started exporting cryptographic systems in their .signatures (and on mailing labels, T-shirts, ...). This is aimed at making the North American export regulations look silly. The most common one seen in .sig files on Usenet is RSA in 3 lines of Perl:
#!/usr/local/bin/perl -s-- -export-a-crypto-system-sig -RSA-in-3-lines-PERL
($k,$n)=@ARGV;$m=unpack(H.$w,$m."\0"x$w),$_=`echo "16do$w 2+4Oi0$d*-^1[d2%
Sa2/d0<X+d*La1=z\U$n%0]SX$k"[$m*]\EszlXx++p|dc`,s/^.|\W//g,print pack('H*'
,$_)while read(STDIN,$m,($w=2*$d-1+length($n||die"$0 [-d] k n\n")&~1)/2)
Usually we think of Python as a very clearly and spaciously formatted language, because the program structure is indicated by whitespace. However, it is possible to produce very compact code competitive with the above program, as the following programs demonstrate:
- rsa.py (4 lines): Performs RSA public key
encryption/decryption. It requires two arguments, and can accept a
single option: '-d' for decryption (the default action is
encryption). The first argument must be the required exponent,
expressed in hexadecimal, and the second is the modulus, also in
hex. You still have to choose the correct exponent, whether the
'-d' option is present or not; '-d' simply changes the number of
bytes read at a single time.
As an example: Let us assume the modulus is 6819722537 (in hex, 0x1967cb529), the encryption exponent is 65537 (hex 0x10001), and the decryption exponent is 2889233921 (hex 0xac363601). Then, after converting the numbers to hex, we can encrypt and then decrypt by the following commands:
echo 'Top secret message.' | rsa.py 10001 1967cb529 >ciphertext cat ciphertext | rsa.py -d ac363601 1967cb529Here's the program. It's a line longer than the Perl version, but doesn't require that the dc desk calculator program be installed. (Perl doesn't have built-in large integers as Python does, so it has to fork a dc subprocess to do the computations.)
#!/usr/local/bin/python from sys import*;from string import*;a=argv;[s,p,q]=filter(lambda x:x[:1]!= '-',a);d='-d'in a;e,n=atol(p,16),atol(q,16);l=(len(q)+1)/2;o,inb=l-d,l-1+d while s:s=stdin.read(inb);s and map(stdout.write,map(lambda i,b=pow(reduce( lambda x,y:(x<<8L)+y,map(ord,s)),e,n):chr(b>>8*i&255),range(o-1,-1,-1)))
Let's look at some of the tricks used to save space. A simple one: instead of using
import stringand then accessing functions such asstring.atol, all the symbols in the string module are added to the current namespace by usingfrom string import *.A common Python idiom to read lines from a file looks like this:
while(1): line = sys.stdin.readline() if line == "": break # readline() returns an empty string at EOF ... do something with the line ...This is very bad for our line count. Python only allows at most two blocks per line, so you can't write code like
; that will cause a syntax error. However, this would mean our I/O loop would take at least 3 lines, which is unacceptable. A solution is to exploit the fact that Python offers short-circuit evaluation. When asked to evaluate an expression likefor i in [1,2,3]: if i==2: print 'Found 2!'A and B, if A is false, there's no point in evaluating B, since the whole expression will be false no matter what B's value is. This is part of the Python language specification and not subject to change, because it's a useful behaviour. You can write an expression likebooleanValue and timeConsumingFunction(), and know thattimeConsumingFunction()will only be called if it can't be avoided, which is whenbooleanValueis true.Strings evaluate to true if they're not empty, and the empty string is considered to be false, so the above loop can then be rewritten as:
while(line): line = sys.stdin.readline() line and someFunction(line)someFunction(line)is then only called iflineis not the empty string, and the while loop ends whenlinebecomes empty. This trick isn't recommended for use in Python programs that you want to be maintainable, but short-circuit evaluation is a useful feature that you should be aware of; conditions should be written with simple variables first, and function evaluations later, for faster execution by avoiding unnecessary function calls. - arc4.py (5 lines): ARC4 is short for `Alleged RC4'. The real
RC4 algorithm is proprietary to RSA Data Security Inc. In September
of 1994, someone posted C code to both the Cypherpunks mailing list
and to the Usenet newsgroup sci.crypt,
claiming that it implemented the RC4 algorithm. This posted code is
what I'm calling Alleged RC4, or ARC4 for short.
ARC4 is a private-key cipher; the same key is used to both encrypt and decrypt. This is the script most likely to be of practical use. It takes one argument, which is the key expressed in hex bytes. The key can have any non-zero length.
An example: To encrypt and then decrypt a message with the key 'foo', or, in hex, 0x66 0x6f 0x6f:
cat 'Message.' | arc4.py 666f6f >ciphertext cat ciphertext | arc4.py 666f6fThe program is a line longer than RSA.#!/usr/local/bin/python from sys import*;from string import *;t,x,y,j,s,a=range(256),0,0,0,1,argv[1] k=(map(lambda b:atoi(a[b:b+2],16), range(0,len(a),2))*256)[:256] for i in t[:]:j=(k[i]+t[i]+j)%256;t[i],t[j]=t[j],t[i] while(s):s=stdin.read(1);l,x=len(s),(x+1)%256;y,c=(y+t[x])%256,l and ord(s);( t[x],t[y])=t[y],t[x];stdout.write(chr(c^t[(t[x]+t[y])%256])[:l])
- otp.py (2 lines): The only truly unbreakable encryption method
is the one-time pad, or OTP. In the OTP, the plaintext is XORed
with a stream of random data (the pad) to produce the ciphertext.
The recipient then XORs the ciphertext and gets the plaintext
again. This is secure because there is no way for an eavesdropper
to determine what the intended message is; the only thing that can
be learned is the length of the message.
This sounds trivial; just get some random data, XOR it with the message, and there you go! But the problem is with that word "random"; how do you get truly random data? The
rand()function in C or your favorite programming language is almost certainly a pseudo-random generator; a small amount of state is used to generate a sequence of numbers that looks random to various statistical tests. However, given a few numbers of the sequence, it's usually simple to derive the rest of the sequence in either the backward or forward direction. This is obviously unsafe for a one-time pad; if you could get just a few characters of the plaintext, you can XOR the plaintext and cipher to compute the value of the pad at that point, and then derive the rest of the pad.Generating truly random numbers is difficult; usually some physical phenomenon like keystroke timings, radioactive decay, or the noise in a transistor is used. Then there's the problem of key distribution; how does your correspondent get the random numbers? Governments can afford to send a courier who carries the data in a briefcase chained to his/her wrist; can you? Several of the open-source operating systems--Linux and FreeBSD, most notably--offer a random device; reading from
/dev/randomwill produce a stream of random data generated from the timing of your keystrokes, disk interrupts, and other hopefully random phenomena in your computer.In any event, let us assume you've magically obtained some random numbers and placed them in a file called "pad". otp.py simply XORs two files together, so to encrypt a file called "message":
otp.py pad message >ciphertextDecryption is similar:
otp.py pad ciphertext >outputThis program is the shortest of all.
#!/usr/local/bin/python from sys import*;t=p=1;s=stdout;[i,j]=map(lambda f: open(f, 'r'), argv[1:3]) while(t and p):t,p=i.read(1),j.read(1);t and p and s.write(chr(ord(t)^ord(p)))