A Better Foundation for Random Values in Haskell

September 2, 2010

RandomGen – The Old Solution

Mathematicians talk about random bits and many programmers talk about streams of random bytes (ex: /dev/urandom, block cipher counter RNGs), so its a bit odd that Haskell adopted the RandomGen class, which only generates random Ints. Several aspects of RandomGen that are non-ideal include:

  • Only generates Ints (Ints need to be coerced to obtain other types)
  • By virtue of packaging it is often paired with StdGen, a sub-par generator
  • Mandates a ‘split’ operation, which is non-sense or unsafe for some generators (as BOS pointed out in a comment on my last post)
  • Doesn’t allow for generator failure (too much output without a reseed) – this is important for cryptographically secure RNGs
  • Doesn’t allow any method for additional entropy to be included upon request for new data (used at least in NIST SP 800-90 and there are obvious default implementations for all other generators)

Building Something Better

For these reasons I have been convinced that building the new crypto-api package on RandomGen would be a mistake. I’ve thus expanded the scope of crypto-api to include a decent RandomGenerator class. The proposal below is slightly more complex than the old RandomGen, but I consider it more honest (doesn’t hide error conditions / necessitate exceptions).

class RandomGenerator g where
        -- |Instantiate a new random bit generator
        newGen :: B.ByteString -> Either GenError g

        -- |Length of input entropy necessary to instantiate or reseed a generator
        genSeedLen :: Tagged g Int

        -- |Obtain random data using a generator
        genBytes        :: g -> Int -> Either GenError (B.ByteString, g)

        -- |'genBytesAI g i entropy' generates 'i' random bytes and use the
        -- additional input 'entropy' in the generation of the requested data.
        genBytesAI      :: g -> Int -> B.ByteString -> Either GenError (B.ByteString, g)
        genBytesAI g len entropy =
                ... default implementation ...

        -- |reseed a random number generator
        reseed          :: g -> B.ByteString -> Either GenError g

Compared to the old RandomGen class we have:

  1. Random data comes in Bytestrings. RandomGen only gave Ints (what is that? 29 bits? 32 bits? 64? argh!), and depended on another class (Random) to build other values. We can still have a ‘Random’ class built for RandomGenerator – should we have that in this module?
  2. Constructing and reseeding generators is now part of the class.
  3. Splitting the PRNG is now a separate class (not shown)
  4. Generators can accept additional input (genBytesAI). Most generators probably won’t use this, so there is a reasonable default implementation (fmap (xor additionalInput) genBytes).
  5. The possibility to fail – this is not new! Even in the old RandomGen class the underlying PRNGs can fail (the PRNG has hit its period and needs a reseed to avoid repeating the sequence), but RandomGen gave no failure mechanism. I feel justified in forcing all PRNGs to use the same set of error messages because many errors are common to all generators (ex: ReseedRequred) and the action necessary to fix such errors is generalized too.

    In Closing

    The full Data.Crypto.Random module is online and I welcome comments, complaints and patches. This is the class I intend to force users of the Crypto API block cipher modes and Asymmetric Cipher instances to use, so it’s important to get right!

9 Responses to “A Better Foundation for Random Values in Haskell”

  1. I would rename genBytesAI to something else, AI sounds like Artificial Intelligence to me :) How about genBytesFromEntropy?

    Adding failure to the interface makes it very painful to use. It might be needed for cryptographic random generators, but for less demanding general purpose randomness it is just a hassle.

    A possible solution to this problem would be to use a monad like MonadRandom. Although that comes with other problems, most importantly how to combine the code with other monads.

  2. illissius Says:

    Couldn’t the APIs-with-failure-are-painful problem be alleviated with convenience functions to some degree?

    Also, using ‘Tagged g Int’ instead of ‘g -> Int’ is very nice. Should be obvious, but I never thought of it.

  3. Something that always bothered me about the current random number interface and the monad contained in Control.Monad.Random is that if I do:

    foo = do
             x <- getRandom
             y <- getRandom
             return $ x+y
    evalRand foo (mkStdGen 0) 

    I don't get the sum of two independent pseudo-random variables, I get the sum of random variables drawn from the same seed!!!

    This is very weird, and I don't know a clean way to this.

  4. formatting came wrong… Obviously there’s an identation lacking on two lines on the definition of foo.

  5. tommd Says:

    How about genBytesFromEntropy?

    I’ve renamed ‘genBytesAI’ to ‘genBytesWithEntropy’. I didn’t like the term ‘fromEntropy’ as it implies the provided entropy is the only source of randomness.

    Adding failure to the interface makes it very painful to use.

    If you don’t like thinking about failure and rather it bite you then feel free to make/use a shim:

        genBytes = fromRight . ORIG.genBytes
        newGen   = fromRight . ORIG.newGen

    Or if you looked at the module you’d see the instance I already provided:

        instance RandomGenerator g => RandomGen (OriginalRandomClass g) where ...

    A possible solution to this problem would be to use a monad like MonadRandom.

    This came up on the ML too. MonadRandom depends on MTL and prevents us from using a data type for failure (fail :: Monad m => String -> m a).

  6. tommd Says:

    Fixed the indentation.

    I’m not sure what you’re hoping for – each call to getRandom coming from a different PRNG? Thus MonadRandom would actually accept a PRNG Generator and generate generators to generate each individual request? If your PRNG is any good (and the StdGen you posted is not) then you shouldn’t be able to tell if the x and y came from the same or different PRNGs.

  7. This is more about design philosophy, rather than API details but:

    There are some very different uses of (P)RNGs, and I’d be extremely wary of trying to put them behind the same interface.

    1. There are cases where you want a nicely repeatable PRNGs – Monte Carlo simulations and the like, where reproducability is important, and often performance is too. Splitting the state becomes useful for distributed MCs.

    2. Then there’s random numbers for cryptographic purposes – you’ll want to seed from a real random source, mix in more real randomness, etc. Security is far more important than performance. Presumably you’ll want an implementation that’s reproducable for testing purposes, but that’s more a special case. Splitting the state, as has been noted, could be horribly insecure.

    Anyway, I assume you know all this, so sorry for the repetition, but my point is this:

    I believe they should have separate APIs. You shouldn’t be able to screw security up by using a non-cryptographic PRNG for crypto purposes. I think the API should mix in random data for you, if you’re doing that, rather than let the user get it wrong.

    I assume you’re not really interested in the non-crypto PRNG case, and the name ‘Data.Crypto.Random’ is a bit of a giveaway, but my suggestion is that you put a big warning at the top of the docs as to what the module’s intended for, and write it in such a way as to make it virtually impossible for a user to generate a non-cryptographically-secure random stream (unless they use a debug instance).

  8. tommd Says:

    I believe they should have separate APIs.

    You aren’t alone and a proposal to move this interface into the ‘random’ package while cleaning up the System.Random module and leaving two classes “CryptoRandomGen” and “RandomGen” is in the works.

    write it in such a way as to make it virtually impossible for a user to generate a non-cryptographically-secure random stream

    Sadly, that isn’t possible. There will always be a use case for people to provide their own entropy. What I can do, and tried to do, is make it much easier to use correctly than incorrectly. Hence:

        -- newGenIO :: CryptoRandomGen => IO g
        g <- newGenIO
        runWithGen g

    vs people doing something manual:

        entropy <- getEntropy 500
        case newGen entropy of
            Left x -> fail (show x)
            Right g -> runWithGen g

  9. […] blogs on crypto-api have discussed its design and the RNG interface. These were to aid design discussion, so note the code there won’t work without minor […]

Comments are closed.

%d bloggers like this: