Haskell has a moderate history and collection of cryptographically related libraries. For simple hashes and short-message encryption the Crypto library filled many needs. Higher-performing needs for SHA2 and MD5 were supported by pureMD5 and SHA. Gradually the AES, SimpleAES, TwoFish, RSA, ECC, and cryptohash packages appeared, most providing FFI to C implementations, which seemed to solve most users needs for individual low-level algorithms. Unfortunately, none of these gives developers a uniform interface with which to access any of a class of algorithms. To fill this gap I’ve been discussing / developing the crypto-api package.

Crypto-API is an interface to four classes of algorithms plus related helper functions. The four classes include hashes, block ciphers, stream cipher, and asymmetric cipher while related modules includes testing, benchmarking, platform-independent rng, cipher modes, and hash based message authentication codes (hmac).

NOTE: Crypto-API isn’t on Hackage yet, but will be soon. This post is intended to facilitate discussion and motivate package maintainers to write instances.

Hashes

The BlockCipher and Hash classes are the most stable. The interface for Hash is:

class (Binary d, Serialize d, Eq d, Ord d) => Hash ctx d | d -> ctx, ctx -> d where
    outputLength  :: Tagged d BitLength         -- ^ The size of the digest when encoded
    blockLength   :: Tagged d BitLength         -- ^ The size of data operated on in each round of the digest computation
    initialCtx    :: ctx                        -- ^ An initial context, provided with the first call to 'updateCtx'
    updateCtx     :: ctx -> B.ByteString -> ctx -- ^ Used to update a context, repeatedly called until all data is exhausted
                                                                         --   must operate correctly for imputs of n*blockLength bytes for n `elem` [0..]
    finalize      :: ctx -> B.ByteString -> d   -- ^ Finializing a context, plus any message data less than the block size, into a digest

That is, the hash algorithm developer only needs to build the most basic definition of a hash including initial context, update routine, and finalize. It is the responsibility of the higher level routine to obey certain semantics, such as only providing bytestrings that are a multiple of the block length to the update function. Users don’t need to know any of this – all they should care about is:

hash :: (Hash ctx d) => L.ByteString -> d
hash' :: (Hash ctx d) => B.ByteString -> d

… hashing strict or lazy bytestrings.

hashFunc :: Hash c d => d -> (L.ByteString -> d)
hashFunc' :: Hash c d => d -> (B.ByteString -> d)

… obtaining the function that produced a digest.

hmac :: Hash c d => B.ByteString -> L.ByteString -> d
hmac' :: (Hash c d) => B.ByteString -> B.ByteString -> d

… or computing an HMAC of a key + message.

I’d call this a simple interface and one that satisfies the majority of users. There was a comment about including ‘hash’ and associates in the class interface so FFI implementations could override the default for performance reasons. A few optimizations closed the gap significantly which is why these functions remain separate so far. The gap could probably be closed further if ByteString.Lazy would read in chunks of a size modulo 1024 bits (instead of 32KB – 8 bytes, which is a piddly multiple of 64).

Hash instances were made for cryptohash and pureMD5. So far consumers include DRBG and the algorithm specific tests.

Block Ciphers

The BlockCipher class is:

class (Binary k, Serialize k) => BlockCipher k where
    blockSize     :: Tagged k BitLength
    encryptBlock  :: k -> B.ByteString -> B.ByteString
    decryptBlock  :: k -> B.ByteString -> B.ByteString
    buildKey      :: B.ByteString -> Maybe k
    keyLength     :: k -> BitLength       -- ^ keyLength may inspect its argument to return the length

Again, this is intended to capture the essence of block ciphers. Also, a smart constructor ‘buildKey’ is provided so the implementation can weed out weak keys. A non-ideal instance for SimpleAES (see appendix to this blog) was made so I could run benchmarks and mode tests. Crypto-API includes an extensive test framework for AES + modes which is built around parsing NIST KAT files. Note the modes are not finished, not optimized, and only ECB CBC and OFB are tested (I’ve been programming during cocktail hour…).

I’ve yet to include modes as overridable routines of BlockCipher (see above cited comment). This is partly due to a lack of evidence showing a (very likely) performance gain that generalized routines can’t match. Once I see that evidence then I’ll be more likely to make the change.

As with hashes, most users won’t use the class interface but rather the higher level functions provided by Modes.hs (getIV, cbc, unCbc, etc).

RNG

The platform independent RNG is backed by urandom on *nix and the WinCrypt API on windows. My thinking here is any user of /dev/random (on *nix) must be so concerned about security they are carefully controlling most aspects of the platform, thus the non-portability of directly reading /dev/random is inconsequential; e.g. there’s no need to bother with a library to access /dev/random.

The interface: (untested on Windows! If you care about windows please test and debug!)

getEntropy :: ByteLength -> IO B.ByteString
openHandle :: IO CryptHandle
hGetEntropy :: CryptHandle -> Int -> IO B.ByteString
closeHandle :: CryptHandle -> IO ()

If you rarely need quality entropy (ex: just for a quality seed to a PRNG) then use ‘getEntropy’. Frequent users can amortize some handle opening costs by explictly managing their resources and calling the other three functions.

Stream Ciphers

Stream ciphers are assumed to be much like a block cipher in 1-bit CFB mode:

class (Binary k, Serialize k) => StreamCipher k iv | k -> iv where
    buildStreamKey        :: B.ByteString -> Maybe k
    encryptStream         :: k -> iv -> B.ByteString -> (B.ByteString, iv)
    decryptStream         :: k -> iv -> B.ByteString -> (B.ByteString, iv)
    streamKeyLength       :: k -> BitLength

A simple instance would be:

data Xor = Xor B.ByteString

instance Bin.Binary Xor where
    get = undefined
    put = undefined

instance Ser.Serialize Xor where
    get = undefined
    put = undefined

instance StreamCipher Xor Int where
    buildStreamKey = Just . Xor
    encryptStream (Xor k) iv msg = (ct, (B.length msg + iv) `rem` B.length k)
      where
      ct = B.pack $ zipWith xor (B.unpack msg) (drop iv $ cycle $ B.unpack k)
    decryptStream = encryptStream
    streamKeyLength (Xor k) = 8 * (B.length k)

Asymmetric Ciphers

The asymmetric cipher instance currently doesn’t fit any of the available algorithms as it is generalized over random generators. It also is the most likely to change – there are
things I’d change about it right now, but its best to leave the more irk-some aspects to motivate some of you readers to contribute / comment ;-)

class (Binary p, Serialize p) => AsymCipher p where
    generateKeypair :: RandomGen g => g -> BitLength -> Maybe ((p,p),g)
    encryptAsym     :: p -> B.ByteString -> B.ByteString
    decryptAsym     :: p -> B.ByteString -> B.ByteString
    asymKeyLength       :: p -> BitLength

In Closing

1) If you use or develop cryptographic algorithms then join the discussion. I might not use your input but I will carefully consider all comments. Discussion has lead to substantial changes already (thanks guys!). I’m particularly keen on input from stream or asymmetric cipher users.

2) If you maintain any crypto packages then please update to include the correct crypto-api instances. If your package is a block cipher then make sure you’re exporting a pure interface in addition to particular modes.

3) If you use Windows then please help shore up the System.Crypto.Random module – I know it needs work!

4) If you use crypto packages please don’t make an instance or only do so to submit them upstream! Instance belong with the algorithm implementation!

5) Everyone else who wants to help feel free to write modes (XTS, GCM, CTR, etc), make fixes & optimizations, add tests (cipher properties, known answer tests), fix ByteString.Lazy.Internal.defaultChunkSize or export hGetContentsN, and add Data.Crypto.Padding (ex: pkcs5). If none of that interests you but the general topic of cryptography in Haskell does then consider working to improve hecc, add TLS or digest-auth to HappStack, write an IPSec implementation, make a pfkey2 package, improve GHC optimization of the algorithms, or make more fitting primitives!

Appendix on SimpleAES:

SimpleAES exported sufficient constructs with which to build an instance but it isn’t very clean. The main issues are:
1) Building a key can throw exceptions (when it should use Maybe or Either) and the result of key expansion (a costly operation in AES) isn’t stored but recomputed each time.
2) A properly sized IV is required even for ECB mode – which doesn’t actually use an IV. Worse, the “encryptMsg'” function will actually expand the size of data even when using ECB mode.
3) The key isn’t it’s own type, which is a good practice in addition to being needed to make an instance. This ties back to the smart constructor concept of #1.

I always assumed my run was about 5k, but being out of shape it felt more like 7km.  Eventually this bugged me enough that I spent a whole ten minutes at a coffee shop to learn the GPX library and make a program that converts my GPX traces (I carry a GPS logger on jogs) to a distance.  Thank you hackage, thank you Tony.

module Main where

import Data.Geo.GPX
import Data.GPS
import Control.Monad
import System.Environment (getArgs)


main = do
 file <- liftM head getArgs
 run <- readGpxFile file
 let cs = map (degreePairToDMS. latlon) . trkpts . head . trksegs . head . trks . head $ run
     pairs = zip cs (drop 1 cs)
     dist = sum (map (uncurry distance) pairs)
 print dist

Introductions

January 26, 2010

Some introductions are an order.  Megan, meet the world.  World, meet Megan.

GHC on ARM

January 19, 2010

During my “Linux Kernel Modules with Haskell” tech talk I mentioned my next personal project (on my already over-full plate) would be to play with Haskell on ARM.  I’m finally getting around to a little bit of playing!  Step zero was to get hardware so I acquired a touchbook – feel free to ignore all the marketing on that site (though I am quite happy with it) and just understand it is the equivalent of a beagleboard with keyboard, touch-pad, touch screen, speakers, wifi, bluetooth, two batteries, more USB ports, and a custom Linux distribution.

Step 1: Get an Unregistered Build

To start its best to bypass the porting GHC instructions and steal someone elses porting effort in the form of a Debian package (actually, three debian packages).  Convert them to a .tar.gz (unless you have a debian OS on your ARM system) using a handy deb2targz script.  Now untar them onto your ARM system via “sudo tar xzf oneOfThePackages.tar.gz -C /” .  Be sure to copy the package.conf  as it seems to be missing from the .debs “sudo cp /usr/lib/ghc-6.10.4/package.conf.shipped /var/lib/ghc-6.10.4/package.conf”.   After all this you should have a working copy of GHC 6.10.4 – confirm that assertion by compiling some simple test programs.

I now have my copy of 6.10.4 building GHC 6.12.1.  The only hitch thus far was needing to add -mlong-calls to the C options when running ./configure.  With luck I will soon have an unregistered GHC 6.12.1 on my ARM netbook.  I’ll edit this post tomorrow with results (yes, I’m actually compiling on the ARM and not in an x86 + QEMU environment).

Step 2: Get a registered build – upstream patches

This is where things become more black-box to me.  I want to make a native code generator (NCG) for GHC/ARM.  There are some decent notes about the RTS at the end of the previously mentioned porting guide and there is also a (up-to-date?) page on the NCG.  Hopefully this, combined with the GHC source, will be enough but I’ll probably be poking my head into #ghc more often.

Step 3: Write more Haskell

The purpose of all of this was to use Haskell on my ARM system.  Hopefully I’ll find time to tackle some problems that non-developers will care about!

Lately I’ve been killing off course requirements by taking project based courses.  Happily, Bart Massey offered a Linux device driver course that gave me an opportunity to get my hands dirty in the embedded world again.  The project was to make an AVR based USB gadget do something… anything.  Most prior experience with µCs was with Rabbits – a fitting name seeing as the useful and higher level libraries seemed to multiply like rabbits.  Unfortunately, the AVRs didn’t have any apparent high level library for USB; many of the examples used commented byte arrays as USB descriptors.  While understandable, this feels harder to debug, ugly, and just generally wrong.  Enter LUFA.

LUFA

LUFA is a high-level USB library for AVRs generously released under MIT.  It has considerable documentation, but no real guide stating what the bare-minimum functions are for any new/custom gadget – this is my attempt at providing such a guide.  After you verify your AVR is supported by LUFA go ahead and download LUFA to an includes directory for your project.  All code here is made possible by LUFA and its included examples – I’ll only claim to being the original author of any and all bugs.

My Device

Before going any further you should have an understanding of what your device will do, how many and what type of  USB endpoints it will use (BULK, INTERRUPT, CONTROL, ISOCRONOUS), and what driver will support it.  If you are making something that will use a generic driver, such as a HID, then you probably can stop here as there is plenty of code for you to use – this post is aimed at people making custom devices.

I’m using a Teensy 2.0 with three bulk interfaces (two OUT one IN).  One OUT provides commands to the gadget, which will change its state.  Another OUT will send jobs to the gadget and the final IN will read the results of those jobs.  This requires a custom driver which I and several other folks developed for course work.

The Makefile

Its probably safe for you to rip-off one of the makefiles from an example in LUFA/Demo/Devices/LowLevel/.  Noitice several of the defines change the API, defines I used for this post/project include:

 -DUSB_DEVICE_ONLY             # Saves space if you only need gadget-side code
 -DNO_STREAM_CALLBACKS         # Changes the API to exclude any callbacks on streams
 -DINTERRUPT_CONTROL_ENDPOINT  # So you don't have to call USB_USBTask() manually on a regular basis
 -DUSE_FLASH_DESCRIPTORS       # Save RAM, leave these in the Flash
 -DUSE_STATIC_OPTIONS="(USB_DEVICE_OPT_FULLSPEED | USB_OPT_AUTO_PLL)"

EDIT: All the makefiles I’ve seen are broken, they don’t properly track dependencies, so either fix that or be sure to ‘make clean’ prior to each build.

Enumeration

When your gadget is plugged in its going to have to enumerate its interfaces by sending descriptors to the host.  Start by declaring a global USB_Descriptor_Device_t along with Language, Manufacturer and Product structures:

USB_Descriptor_Device_t PROGMEM DeviceDescriptor =
{
 .Header                 = {.Size = sizeof(USB_Descriptor_Device_t), .Type = DTYPE_Device},

 .USBSpecification       = VERSION_BCD(01.10),
 .Class                  = 0xff,
 .SubClass               = 0xaa,
 .Protocol               = 0x00,

 .Endpoint0Size          = FIXED_CONTROL_ENDPOINT_SIZE,

 .VendorID               = 0x16c0,
 .ProductID              = 0x0478,
 .ReleaseNumber          = 0x0000,

 .ManufacturerStrIndex   = 0x01,
 .ProductStrIndex        = 0x02,
 .SerialNumStrIndex      = USE_INTERNAL_SERIAL,

 .NumberOfConfigurations = FIXED_NUM_CONFIGURATIONS
};

USB_Descriptor_String_t PROGMEM LanguageString =
{
 .Header                 = {.Size = USB_STRING_LEN(1), .Type = DTYPE_String},
 .UnicodeString          = {LANGUAGE_ID_ENG}
};

USB_Descriptor_String_t PROGMEM ManufacturerString =
{
 .Header                 = {.Size = USB_STRING_LEN(3), .Type = DTYPE_String},
 .UnicodeString          = L"JRT"
};

USB_Descriptor_String_t PROGMEM ProductString =
{
 .Header                 = {.Size = USB_STRING_LEN(13), .Type = DTYPE_String},
 .UnicodeString          = L"Micro Storage"
};

You’ll also need a descriptor that enumerates your interfaces and endpoints.  Seeing as this is specific to your project, and C is a significantly limited language, you’ll need to make a quick structure declaration then fill that structure:

typedef struct {
 USB_Descriptor_Configuration_Header_t Config;
 USB_Descriptor_Interface_t            Interface;
 USB_Descriptor_Endpoint_t             DataInEndpoint;
 USB_Descriptor_Endpoint_t             DataOutEndpoint;
 USB_Descriptor_Endpoint_t             CommandEndpoint;
} USB_Descriptor_Configuration_t;

USB_Descriptor_Configuration_t PROGMEM ConfigurationDescriptor =
{
 .Config =
 {
 .Header                 = {.Size = sizeof(USB_Descriptor_Configuration_Header_t), .Type = DTYPE_Configuration},

 .TotalConfigurationSize = sizeof(USB_Descriptor_Configuration_t),
 .TotalInterfaces        = 1,

 .ConfigurationNumber    = 1,
 .ConfigurationStrIndex  = NO_DESCRIPTOR,

 .ConfigAttributes       = USB_CONFIG_ATTR_BUSPOWERED,

 .MaxPowerConsumption    = USB_CONFIG_POWER_MA(100)
 },

 .Interface =
 {
 .Header                 = {.Size = sizeof(USB_Descriptor_Interface_t), .Type = DTYPE_Interface},

 .InterfaceNumber        = 0,
 .AlternateSetting       = 0,
 .TotalEndpoints         = 3,

 .Class                  = 0xff,
 .SubClass               = 0xaa,
 .Protocol               = 0x0,

 .InterfaceStrIndex      = NO_DESCRIPTOR
 },

 .DataInEndpoint =
 {
 .Header                 = {.Size = sizeof(USB_Descriptor_Endpoint_t), .Type = DTYPE_Endpoint},

 .EndpointAddress        = (ENDPOINT_DESCRIPTOR_DIR_IN | BULK_IN_EPNUM),
 .Attributes             = (EP_TYPE_BULK | ENDPOINT_ATTR_NO_SYNC | ENDPOINT_USAGE_DATA),
 .EndpointSize           = BULK_IN_EPSIZE,
 .PollingIntervalMS      = 0x00
 },

 .DataOutEndpoint =
 {
 .Header                 = {.Size = sizeof(USB_Descriptor_Endpoint_t), .Type = DTYPE_Endpoint},

 .EndpointAddress        = (ENDPOINT_DESCRIPTOR_DIR_OUT | BULK_OUT_EPNUM),
 .Attributes             = (EP_TYPE_BULK | ENDPOINT_ATTR_NO_SYNC | ENDPOINT_USAGE_DATA),
 .EndpointSize           = BULK_OUT_EPSIZE,
 .PollingIntervalMS      = 0x00
 },

 .CommandEndpoint =
 {
 .Header                 = {.Size = sizeof(USB_Descriptor_Endpoint_t), .Type = DTYPE_Endpoint},

 .EndpointAddress        = (ENDPOINT_DESCRIPTOR_DIR_OUT | COMMAND_EPNUM),
 .Attributes             = (EP_TYPE_BULK | ENDPOINT_ATTR_NO_SYNC | ENDPOINT_USAGE_DATA),
 .EndpointSize           = COMMAND_EPSIZE,
 .PollingIntervalMS      = 0x00
 }

};

So now you see I have a device with a single interface and three bulk endpoints (two OUT and one IN).  Remember, in USB terminology OUT means from the HOST to the GADGET while IN means from the GADGET to the HOST.  While there is a direction to these channels, USB 1.0 and 2.0 BULK communications start with the host sending a message to the device (i.e. IN endpoints are polled) and the device should always respond.  Now that we have descriptors lets be sure to send them when the host tries to enumerate the device.

uint16_t CALLBACK_USB_GetDescriptor
   (const uint16_t wValue,
    const uint8_t  wIndex,
    void **const   DescriptorAddress)
{
    const uint8_t  DescriptorType   = (wValue >> 8);
    const uint8_t  DescriptorNumber = (wValue & 0xFF);

    void*    Address = NULL;
    uint16_t Size    = NO_DESCRIPTOR;

    switch (DescriptorType) {
        case DTYPE_Device:
            Address = (void*)&DeviceDescriptor;
            Size    = sizeof(USB_Descriptor_Device_t);
            break;
        case DTYPE_Configuration:
            Address = (void*)&ConfigurationDescriptor;
            Size    = sizeof(USB_Descriptor_Configuration_t);
            break;
        case DTYPE_String:
            switch (DescriptorNumber) {
                case 0x00:
                    Address = (void*)&LanguageString;
                    Size    = pgm_read_byte(&LanguageString.Header.Size);
                    break;
                case 0x01:
                    Address = (void*)&ManufacturerString;
                    Size    = pgm_read_byte(&ManufacturerString.Header.Size);
                    break;
                case 0x02:
                    Address = (void*)&ProductString;
                    Size    = pgm_read_byte(&ProductString.Header.Size);
                    break;
            }
            break;
    }

    *DescriptorAddress = Address;
    return Size;
}

With this, and calling USB_Init() when your device starts up, you should have a gadget that is properly enumerated by the matching driver.  Once a configuration is selected we need to configure the endpoints, but after that we can finally send and receive data!

void EVENT_USB_Device_ConfigurationChanged(void)
{
    int i,succ=1;

    succ &= Endpoint_ConfigureEndpoint(BULK_IN_EPNUM, EP_TYPE_BULK, ENDPOINT_DIR_IN,
                                       BULK_IN_EPSIZE, ENDPOINT_BANK_SINGLE);

    succ &= Endpoint_ConfigureEndpoint(BULK_OUT_EPNUM, EP_TYPE_BULK, ENDPOINT_DIR_OUT,
                                       BULK_OUT_EPSIZE, ENDPOINT_BANK_SINGLE);

    succ &= Endpoint_ConfigureEndpoint(COMMAND_EPNUM, EP_TYPE_BULK,
                                       ENDPOINT_DIR_OUT, COMMAND_EPSIZE,
                                       ENDPOINT_BANK_SINGLE);
    LED_CONFIG;
    while(!succ) {
        LED_ON;
        _delay_ms(1000);
        LED_OFF;
        _delay_ms(1000);
    }
    for(i = 0; i < 5; i ++) {
        LED_ON;
        _delay_ms(50);
        LED_OFF;
        _delay_ms(50);
    }
}

Now to handle any data on the endpoints, you’ll probably want to tight loop around code that checks each endpoint, reads data, and performs some action:

    if (USB_DeviceState != DEVICE_STATE_Configured)
        return;

    Endpoint_SelectEndpoint(BULK_IN_EPNUM);
    if (Endpoint_IsConfigured() && Endpoint_IsINReady() && Endpoint_IsReadWriteAllowed()) {
        do_something(state, data, &len);
        err = Endpoint_Write_Stream_LE((void *)data, len);
        // FIXME handle err
        Endpoint_ClearIN();
    }

    Endpoint_SelectEndpoint(BULK_OUT_EPNUM);
    if (Endpoint_IsConfigured() && Endpoint_IsOUTReceived() && Endpoint_IsReadWriteAllowed()) {
       err = Endpoint_Read_Stream_LE(data, len);
       do_other_thing(data, len);
       Endpoint_ClearOUT();
    }

    Endpoint_SelectEndpoint(COMMAND_EPNUM);
    if (Endpoint_IsConfigured() && Endpoint_IsOUTReceived() && Endpoint_IsReadWriteAllowed()) {
        usb_cmd_t cmd;
        err = Endpoint_Read_Stream_LE(&cmd, sizeof(usb_cmd_t));
        update_state(cmd);
        Endpoint_ResetFIFO(BULK_IN_EPNUM);
        Endpoint_ResetFIFO(BULK_OUT_EPNUM);
        Endpoint_ClearOUT();
    }

And that’s the whole deal. Some things to take note of:

1) I don’t know why I needed Endpoint_ResetFIFO() – my results lagged by one message without them but I feel certain this is not the correct way to fix the issue. Perhaps I need to reset the endpoints some time after configuration and before use, but I’m not sure which callback to use for that, if that’s the case.

2) C needs phantom types.  Many of these functions aren’t intended for use with CONTROL endpoints.  See the LUFA documentation for which functions to use if you have non-BULK endpoints.

3) Be sure to number your ENDPOINTS starting at 1 (the *_EPNUM values).  I’ve seen unusual behavior when numbering started higher – not sure if that was against standards or not.

Conclusion

At a minimum, you must declare a “USB_Descriptor_Device_t” variable, three “USB_Descriptor_String_t” variables (Language, Manufacturer, Product), a custom “USB_Descriptor_Configuration_t” structure and variable.  Also declare a function named “CALLBACK_USB_GetDescriptor” and one called “EVENT_USB_Device_ConfigurationChanged(void)” which calls
“Endpoint_ConfigureEndpoint(*_EPNUM, EP_TYPE_{BULK, INT, CONTROL, ISOC}, ENDPOINT_DIR_{IN,OUT}, *_EPSIZE, ENDPOINT_BANK_{DOUBLE,SINGLE});” for each endpoint in the configuration. Finally use “Endpoint_SelectEndpoint(*_EPNUM)” followed by “Endpoint_Is{INReady,OutReceived}() && Endpoint_IsReadWriteAllowed()” to select an endpoint and check for data. Finish all this by reading your stream “Endpoint_Read_Stream_LE(buf, len)” and issue a call to “Endpoint_Clear{IN,OUT}()” when done.

Haskell Bindings to C – c2hs

September 22, 2009

When making hsXenCtrl bindings I was thrilled to learn and use hsc2hs – I considered it all that was needed for sugar on top of the FFI. However, the recent Kernel Modules work has opened my eyes to various limitations of hsc2hs and forced me to learn c2hs – which is superior so I’ll go though a simple-stupid tutorial here.

This is intended as a starter companion to the official documentation – not a substitute!  Read the docs, they are quite understandable!

The C Code

Lets consider some basic C code:

struct test {
 int a;
};

struct test *getTheStruct();

struct test t;

struct test *getTheStruct()
{
 return &t;
}

#define add_one(x) (x + 1)

While there are the normal issues of calling functions and accessing structure fields, it’s no coincidence that this code embodies the two issues I came across when using c2hs.  First, it has a macro, for which there is no automatic FFI generation, and secondly it has structures that aren’t typedef’ed to a single keyword.  We’ll see how to handle both those along with the rest of our simple chs code.

The .chs File

We now write a test.chs file that will interface with the code defined in our c header.

{-# LANGUAGE ForeignFunctionInterface #-}

import Foreign.C.Types
import Foreign.Ptr
import Foreign.Storable

#include "test.h"

The boiler plate shouldn’t be surprising – the only new aspect is the inclusion of the header file that defines the types and functions to which we must bind.

Now lets define a pointer type.  Unfortunately this code doesn’t use any typedefs and c2hs can’t parse “struct foo” as a single C type identifier.  To fix this we use a section of C code, which will appear in the file “test.chs.h” once we run c2hs:

#c
typedef struct test test_t;
#endc

We may now define the pointer types using “test_t” instead of “struct test”

{#pointer *test_t as Test#}

This is amazingly simple – it translates into “type Test = Ptr ()”.  You can add the keyword “newtype” at the end to obtain type enforcement if you so desire.

To access fields in this structure you could define a whole storable instance, using c2hs keywords ‘sizeof’, ‘get’, and ‘put’ the whole way – but we didn’t create a ‘newtype’ on which a typeclass instance can be defined.  All we need are a pair of get and set functions – but you can see how to use these and thus create any Storable instance you want.

getA :: Test -> IO Int
getA t = {#get test_t->a#} t >>= return . fromIntegral

setA :: Test -> Int -> IO ()
setA t i = {#set test_t->a#} t (fromIntegral i)

As you can see, use the {#get …} and {#set …} hooks along with the C type (“test_t”) and C field name(s) (“a”)  to create a function that will access the types appropriately.  Now we need to handle the damned macro.  There exists no {#macro …} hook, and function hooks translate directly to “foreign import …” so the first step is using another C section to make a function wrap the macro:

#c
int add_one_(int x)
{
 return (add_one(x));
}
#endc

This doesn’t warrant any more discussion, so lets move onto the function calls!  The function hook is hands-down ugly, but powerful.  First you tell it the C function you’re calling (add_one_).  Then specify the name of the Haskell function; hat (“^”) means convert the C name to camel case (addOne), or you can specify any name you desire (ex: “hs_add_one”).  After the naming is handled then you provide a list of arguments in curry braces – {arg1, arg2, …, argn} – then an arrow and a result Type.  Each argument and return value is a Haskell type and enclosed in a back tick and single quote, such as `Int’.  Before and after each argument you can provide in and out marshalling functions – this is where all the power of c2hs is and is also well stated in the documentation so I won’t be repeating it here – just notice that I marshal CInt to and from Int via “fromIntegral” and no marshalling is needed for the pointer (“Test”) so we use identity (“id”).

{#fun add_one_ as ^
 {fromIntegral `Int' } -> `Int' fromIntegral#}
{#fun get_the_struct as ^
 {} -> `Test' id#}

The remaining task is to define a simple ‘main’ function for testing purposes.

main = do
 s <- getTheStruct
 setA s 5
 a <- getA s
 b <- addOne a
 print a
 print b

Compile and Test

The files test.hs  test.chs.h are generated by c2hs:

c2hs test.chs

For purposes of running this test we can finish this easily:

mv test.chs.h test.chs.c
ghc --make test.hs test.chs.c
./test
5
6

C Bindings and Kernel Code

This is only the beginning - I'm using this with kernel headers and as such test.chs.h will be built using the kernel build system.  For this reason I need to manually modified the generated haskell FFI to use the "regparm3" calling convention or not use the c2hs function hooks ({#fun ...}).  Other than that, and an small gcc non-standard "flexability" that c2hs can't parse without minor correction, it seems c2hs is a great tool for kernel bindings and I hope to have a real (though simple) driver done soon!

HacPDX is Coming

September 19, 2009

That’s right – HacPDX is less than a week away so REGISTER if you haven’t. Failure to register means you might not have network access or even a chair!

I’ve planned to work on networking but I’ll be happy to work on anything that interests people and makes progress including networking, crypto, kernel modules, hackage-server, even ARM (assuming an expert attends).

I’ve also heard a number of people mention working on various C Bindings – which is awesome because I suck at marshalling so maybe they can help. Can’t wait to see everyone there and hacking away!

Follow

Get every new post delivered to your Inbox.