A Haskell API for Cryptographic Algorithms

August 23, 2010

Haskell has a moderate history and collection of cryptographically related libraries. For simple hashes and short-message encryption the Crypto library filled many needs. Higher-performing needs for SHA2 and MD5 were supported by pureMD5 and SHA. Gradually the AES, SimpleAES, TwoFish, RSA, ECC, and cryptohash packages appeared, most providing FFI to C implementations, which seemed to solve most users needs for individual low-level algorithms. Unfortunately, none of these gives developers a uniform interface with which to access any of a class of algorithms. To fill this gap I’ve been discussing / developing the crypto-api package.

Crypto-API is an interface to four classes of algorithms plus related helper functions. The four classes include hashes, block ciphers, stream cipher, and asymmetric cipher while related modules includes testing, benchmarking, platform-independent rng, cipher modes, and hash based message authentication codes (hmac).

NOTE: Crypto-API isn’t on Hackage yet, but will be soon. This post is intended to facilitate discussion and motivate package maintainers to write instances.

Hashes

The BlockCipher and Hash classes are the most stable. The interface for Hash is:

class (Binary d, Serialize d, Eq d, Ord d) => Hash ctx d | d -> ctx, ctx -> d where
    outputLength  :: Tagged d BitLength         -- ^ The size of the digest when encoded
    blockLength   :: Tagged d BitLength         -- ^ The size of data operated on in each round of the digest computation
    initialCtx    :: ctx                        -- ^ An initial context, provided with the first call to 'updateCtx'
    updateCtx     :: ctx -> B.ByteString -> ctx -- ^ Used to update a context, repeatedly called until all data is exhausted
                                                                         --   must operate correctly for imputs of n*blockLength bytes for n `elem` [0..]
    finalize      :: ctx -> B.ByteString -> d   -- ^ Finializing a context, plus any message data less than the block size, into a digest

That is, the hash algorithm developer only needs to build the most basic definition of a hash including initial context, update routine, and finalize. It is the responsibility of the higher level routine to obey certain semantics, such as only providing bytestrings that are a multiple of the block length to the update function. Users don’t need to know any of this – all they should care about is:

hash :: (Hash ctx d) => L.ByteString -> d
hash' :: (Hash ctx d) => B.ByteString -> d

… hashing strict or lazy bytestrings.

hashFunc :: Hash c d => d -> (L.ByteString -> d)
hashFunc' :: Hash c d => d -> (B.ByteString -> d)

… obtaining the function that produced a digest.

hmac :: Hash c d => B.ByteString -> L.ByteString -> d
hmac' :: (Hash c d) => B.ByteString -> B.ByteString -> d

… or computing an HMAC of a key + message.

I’d call this a simple interface and one that satisfies the majority of users. There was a comment about including ‘hash’ and associates in the class interface so FFI implementations could override the default for performance reasons. A few optimizations closed the gap significantly which is why these functions remain separate so far. The gap could probably be closed further if ByteString.Lazy would read in chunks of a size modulo 1024 bits (instead of 32KB – 8 bytes, which is a piddly multiple of 64).

Hash instances were made for cryptohash and pureMD5. So far consumers include DRBG and the algorithm specific tests.

Block Ciphers

The BlockCipher class is:

class (Binary k, Serialize k) => BlockCipher k where
    blockSize     :: Tagged k BitLength
    encryptBlock  :: k -> B.ByteString -> B.ByteString
    decryptBlock  :: k -> B.ByteString -> B.ByteString
    buildKey      :: B.ByteString -> Maybe k
    keyLength     :: k -> BitLength       -- ^ keyLength may inspect its argument to return the length

Again, this is intended to capture the essence of block ciphers. Also, a smart constructor ‘buildKey’ is provided so the implementation can weed out weak keys. A non-ideal instance for SimpleAES (see appendix to this blog) was made so I could run benchmarks and mode tests. Crypto-API includes an extensive test framework for AES + modes which is built around parsing NIST KAT files. Note the modes are not finished, not optimized, and only ECB CBC and OFB are tested (I’ve been programming during cocktail hour…).

I’ve yet to include modes as overridable routines of BlockCipher (see above cited comment). This is partly due to a lack of evidence showing a (very likely) performance gain that generalized routines can’t match. Once I see that evidence then I’ll be more likely to make the change.

As with hashes, most users won’t use the class interface but rather the higher level functions provided by Modes.hs (getIV, cbc, unCbc, etc).

RNG

The platform independent RNG is backed by urandom on *nix and the WinCrypt API on windows. My thinking here is any user of /dev/random (on *nix) must be so concerned about security they are carefully controlling most aspects of the platform, thus the non-portability of directly reading /dev/random is inconsequential; e.g. there’s no need to bother with a library to access /dev/random.

The interface: (untested on Windows! If you care about windows please test and debug!)

getEntropy :: ByteLength -> IO B.ByteString
openHandle :: IO CryptHandle
hGetEntropy :: CryptHandle -> Int -> IO B.ByteString
closeHandle :: CryptHandle -> IO ()

If you rarely need quality entropy (ex: just for a quality seed to a PRNG) then use ‘getEntropy’. Frequent users can amortize some handle opening costs by explictly managing their resources and calling the other three functions.

Stream Ciphers

Stream ciphers are assumed to be much like a block cipher in 1-bit CFB mode:

class (Binary k, Serialize k) => StreamCipher k iv | k -> iv where
    buildStreamKey        :: B.ByteString -> Maybe k
    encryptStream         :: k -> iv -> B.ByteString -> (B.ByteString, iv)
    decryptStream         :: k -> iv -> B.ByteString -> (B.ByteString, iv)
    streamKeyLength       :: k -> BitLength

A simple instance would be:

data Xor = Xor B.ByteString

instance Bin.Binary Xor where
    get = undefined
    put = undefined

instance Ser.Serialize Xor where
    get = undefined
    put = undefined

instance StreamCipher Xor Int where
    buildStreamKey = Just . Xor
    encryptStream (Xor k) iv msg = (ct, (B.length msg + iv) `rem` B.length k)
      where
      ct = B.pack $ zipWith xor (B.unpack msg) (drop iv $ cycle $ B.unpack k)
    decryptStream = encryptStream
    streamKeyLength (Xor k) = 8 * (B.length k)

Asymmetric Ciphers

The asymmetric cipher instance currently doesn’t fit any of the available algorithms as it is generalized over random generators. It also is the most likely to change – there are
things I’d change about it right now, but its best to leave the more irk-some aspects to motivate some of you readers to contribute / comment ;-)

class (Binary p, Serialize p) => AsymCipher p where
    generateKeypair :: RandomGen g => g -> BitLength -> Maybe ((p,p),g)
    encryptAsym     :: p -> B.ByteString -> B.ByteString
    decryptAsym     :: p -> B.ByteString -> B.ByteString
    asymKeyLength       :: p -> BitLength

In Closing

1) If you use or develop cryptographic algorithms then join the discussion. I might not use your input but I will carefully consider all comments. Discussion has lead to substantial changes already (thanks guys!). I’m particularly keen on input from stream or asymmetric cipher users.

2) If you maintain any crypto packages then please update to include the correct crypto-api instances. If your package is a block cipher then make sure you’re exporting a pure interface in addition to particular modes.

3) If you use Windows then please help shore up the System.Crypto.Random module – I know it needs work!

4) If you use crypto packages please don’t make an instance or only do so to submit them upstream! Instance belong with the algorithm implementation!

5) Everyone else who wants to help feel free to write modes (XTS, GCM, CTR, etc), make fixes & optimizations, add tests (cipher properties, known answer tests), fix ByteString.Lazy.Internal.defaultChunkSize or export hGetContentsN, and add Data.Crypto.Padding (ex: pkcs5). If none of that interests you but the general topic of cryptography in Haskell does then consider working to improve hecc, add TLS or digest-auth to HappStack, write an IPSec implementation, make a pfkey2 package, improve GHC optimization of the algorithms, or make more fitting primitives!

Appendix on SimpleAES:

SimpleAES exported sufficient constructs with which to build an instance but it isn’t very clean. The main issues are:
1) Building a key can throw exceptions (when it should use Maybe or Either) and the result of key expansion (a costly operation in AES) isn’t stored but recomputed each time.
2) A properly sized IV is required even for ECB mode – which doesn’t actually use an IV. Worse, the “encryptMsg'” function will actually expand the size of data even when using ECB mode.
3) The key isn’t it’s own type, which is a good practice in addition to being needed to make an instance. This ties back to the smart constructor concept of #1.

10 Responses to “A Haskell API for Cryptographic Algorithms”


  1. It’s a bit odd that the type signature for hash depends on the ctx parameter. That internal detail shouldn’t be visible to users without good reason.

    Also, don’t use RandomGen for your asymmetric PRNG. The default implementation in System.Random gives absolutely disastrous performance, and the typeclass is just misdesigned (the split function shouldn’t be present).

  2. Jonas Says:

    With all that crypto stuff, I wonder why there’s still no working HTTP client library that supports SSL. :)

  3. Nikolay Orlyuk Says:

    Sometimes “initialCtx” may be too expensive (like allocating additional memory with FFI) so you may want to leave “resetCtx” for re-using. As well using initialCtx as your only way to allocate ctx is somewhat too restrictive (i.e. you can’t do local memory allocation or something like that). I’d add something “withCtx :: forall a . (ctx -> a) -> a” but that will result in potential leak of ctx outside (there was an article in some blog about that).

    What I’ve thinked not so long time ago is about using Monad abstraction. So you will not see ctx at all, but at the low level you’ll manipulate with updateCtx and others which transform some ctx passed to them from “runHashing :: HashMonad m => m a -> a” or something like that.


  4. […] these reasons I have been convinced that building the new crypto-api package on RandomGen would be a mistake. I’ve thus expanded the scope of crypto-api to […]

  5. tommd Says:

    Yeah, I actually have RFC5246 in my README directory, but frankly we don’t have the low-level API (such as crypto-api) to make a TLS library as clean, maintainable, and generic as I’d like.

    There are lots of protocols we’re missing, many of which I actually listed under item 5. Hopefully interested parties will start coding and all will be well.

  6. tommd Says:

    initialCtx is just a data structure – no FFI calls involved and no allocation. This isn’t “initilizeCtx”. Also, I’m adverse to making the API specifically to match the needs of FFI implementations.

    The withCtx concept is somewhat similar to what I’ve seen in lower level code from a recent library (cryptohash or SimpleAES, I think), but now that I’ve relaxed the semantics from taking a single step per call (single block encrypted or consumed by the hash algorithm) to any number of complete blocks, any performance issues are greatly reduced.

  7. tommd Says:

    It’s a bit odd that the type signature for hash depends on the ctx parameter.

    Odd but not really a big deal. Its not like we can instantiate:

    instance (Hash ctx dgst) => Digest dgst where ...
    

    At least, not without UndecidableInstances, which I avoid on principle.

    Also, don’t use RandomGen for your asymmetric PRNG.

    Please see my recent post on RandomGenerator – more heavy weight but appropriate, imo. People seem put-off by the explicit errors, but they can always bury their heads in the sand and use some sort of “fromRight”, pretending no exceptions will ever happen.

  8. Nikolay Orlyuk Says:

    Yeah, initialCtx is just a lazy value, which will be calculated (with allocating data) once someone will try to get inside of it (and re-calculated if RTS will decide to forget this value). In case if your context is something that have representation in Haskell data and updateCtx forms the new Ctx (which is mostly not true for non-pure implementation) than it will be fine (if that will be FFI that update structure, you’ll just do a clone). But that wouldn’t be so simple once you’ll have to work with something that you can’t bring into Haskell (pointers to unknown structures, session id for some crypt hardware, crypt service etc). I think most of C libraries will require work with IO monad and pointers/ids.


  9. […] blogs on crypto-api have discussed its design and the RNG interface. These were to aid design discussion, so note the code there won’t work […]


Comments are closed.