Lately I’ve been killing off course requirements by taking project based courses.  Happily, Bart Massey offered a Linux device driver course that gave me an opportunity to get my hands dirty in the embedded world again.  The project was to make an AVR based USB gadget do something… anything.  Most prior experience with µCs was with Rabbits – a fitting name seeing as the useful and higher level libraries seemed to multiply like rabbits.  Unfortunately, the AVRs didn’t have any apparent high level library for USB; many of the examples used commented byte arrays as USB descriptors.  While understandable, this feels harder to debug, ugly, and just generally wrong.  Enter LUFA.


LUFA is a high-level USB library for AVRs generously released under MIT.  It has considerable documentation, but no real guide stating what the bare-minimum functions are for any new/custom gadget – this is my attempt at providing such a guide.  After you verify your AVR is supported by LUFA go ahead and download LUFA to an includes directory for your project.  All code here is made possible by LUFA and its included examples – I’ll only claim to being the original author of any and all bugs.

My Device

Before going any further you should have an understanding of what your device will do, how many and what type of  USB endpoints it will use (BULK, INTERRUPT, CONTROL, ISOCRONOUS), and what driver will support it.  If you are making something that will use a generic driver, such as a HID, then you probably can stop here as there is plenty of code for you to use – this post is aimed at people making custom devices.

I’m using a Teensy 2.0 with three bulk interfaces (two OUT one IN).  One OUT provides commands to the gadget, which will change its state.  Another OUT will send jobs to the gadget and the final IN will read the results of those jobs.  This requires a custom driver which I and several other folks developed for course work.

The Makefile

Its probably safe for you to rip-off one of the makefiles from an example in LUFA/Demo/Devices/LowLevel/.  Noitice several of the defines change the API, defines I used for this post/project include:

 -DUSB_DEVICE_ONLY             # Saves space if you only need gadget-side code
 -DNO_STREAM_CALLBACKS         # Changes the API to exclude any callbacks on streams
 -DINTERRUPT_CONTROL_ENDPOINT  # So you don't have to call USB_USBTask() manually on a regular basis
 -DUSE_FLASH_DESCRIPTORS       # Save RAM, leave these in the Flash

EDIT: All the makefiles I’ve seen are broken, they don’t properly track dependencies, so either fix that or be sure to ‘make clean’ prior to each build.


When your gadget is plugged in its going to have to enumerate its interfaces by sending descriptors to the host.  Start by declaring a global USB_Descriptor_Device_t along with Language, Manufacturer and Product structures:

USB_Descriptor_Device_t PROGMEM DeviceDescriptor =
 .Header                 = {.Size = sizeof(USB_Descriptor_Device_t), .Type = DTYPE_Device},

 .USBSpecification       = VERSION_BCD(01.10),
 .Class                  = 0xff,
 .SubClass               = 0xaa,
 .Protocol               = 0x00,

 .Endpoint0Size          = FIXED_CONTROL_ENDPOINT_SIZE,

 .VendorID               = 0x16c0,
 .ProductID              = 0x0478,
 .ReleaseNumber          = 0x0000,

 .ManufacturerStrIndex   = 0x01,
 .ProductStrIndex        = 0x02,
 .SerialNumStrIndex      = USE_INTERNAL_SERIAL,

 .NumberOfConfigurations = FIXED_NUM_CONFIGURATIONS

USB_Descriptor_String_t PROGMEM LanguageString =
 .Header                 = {.Size = USB_STRING_LEN(1), .Type = DTYPE_String},
 .UnicodeString          = {LANGUAGE_ID_ENG}

USB_Descriptor_String_t PROGMEM ManufacturerString =
 .Header                 = {.Size = USB_STRING_LEN(3), .Type = DTYPE_String},
 .UnicodeString          = L"JRT"

USB_Descriptor_String_t PROGMEM ProductString =
 .Header                 = {.Size = USB_STRING_LEN(13), .Type = DTYPE_String},
 .UnicodeString          = L"Micro Storage"

You’ll also need a descriptor that enumerates your interfaces and endpoints.  Seeing as this is specific to your project, and C is a significantly limited language, you’ll need to make a quick structure declaration then fill that structure:

typedef struct {
 USB_Descriptor_Configuration_Header_t Config;
 USB_Descriptor_Interface_t            Interface;
 USB_Descriptor_Endpoint_t             DataInEndpoint;
 USB_Descriptor_Endpoint_t             DataOutEndpoint;
 USB_Descriptor_Endpoint_t             CommandEndpoint;
} USB_Descriptor_Configuration_t;

USB_Descriptor_Configuration_t PROGMEM ConfigurationDescriptor =
 .Config =
 .Header                 = {.Size = sizeof(USB_Descriptor_Configuration_Header_t), .Type = DTYPE_Configuration},

 .TotalConfigurationSize = sizeof(USB_Descriptor_Configuration_t),
 .TotalInterfaces        = 1,

 .ConfigurationNumber    = 1,
 .ConfigurationStrIndex  = NO_DESCRIPTOR,

 .ConfigAttributes       = USB_CONFIG_ATTR_BUSPOWERED,

 .MaxPowerConsumption    = USB_CONFIG_POWER_MA(100)

 .Interface =
 .Header                 = {.Size = sizeof(USB_Descriptor_Interface_t), .Type = DTYPE_Interface},

 .InterfaceNumber        = 0,
 .AlternateSetting       = 0,
 .TotalEndpoints         = 3,

 .Class                  = 0xff,
 .SubClass               = 0xaa,
 .Protocol               = 0x0,

 .InterfaceStrIndex      = NO_DESCRIPTOR

 .DataInEndpoint =
 .Header                 = {.Size = sizeof(USB_Descriptor_Endpoint_t), .Type = DTYPE_Endpoint},

 .EndpointAddress        = (ENDPOINT_DESCRIPTOR_DIR_IN | BULK_IN_EPNUM),
 .EndpointSize           = BULK_IN_EPSIZE,
 .PollingIntervalMS      = 0x00

 .DataOutEndpoint =
 .Header                 = {.Size = sizeof(USB_Descriptor_Endpoint_t), .Type = DTYPE_Endpoint},

 .EndpointSize           = BULK_OUT_EPSIZE,
 .PollingIntervalMS      = 0x00

 .CommandEndpoint =
 .Header                 = {.Size = sizeof(USB_Descriptor_Endpoint_t), .Type = DTYPE_Endpoint},

 .EndpointSize           = COMMAND_EPSIZE,
 .PollingIntervalMS      = 0x00


So now you see I have a device with a single interface and three bulk endpoints (two OUT and one IN).  Remember, in USB terminology OUT means from the HOST to the GADGET while IN means from the GADGET to the HOST.  While there is a direction to these channels, USB 1.0 and 2.0 BULK communications start with the host sending a message to the device (i.e. IN endpoints are polled) and the device should always respond.  Now that we have descriptors lets be sure to send them when the host tries to enumerate the device.

uint16_t CALLBACK_USB_GetDescriptor
   (const uint16_t wValue,
    const uint8_t  wIndex,
    void **const   DescriptorAddress)
    const uint8_t  DescriptorType   = (wValue >> 8);
    const uint8_t  DescriptorNumber = (wValue & 0xFF);

    void*    Address = NULL;
    uint16_t Size    = NO_DESCRIPTOR;

    switch (DescriptorType) {
        case DTYPE_Device:
            Address = (void*)&DeviceDescriptor;
            Size    = sizeof(USB_Descriptor_Device_t);
        case DTYPE_Configuration:
            Address = (void*)&ConfigurationDescriptor;
            Size    = sizeof(USB_Descriptor_Configuration_t);
        case DTYPE_String:
            switch (DescriptorNumber) {
                case 0x00:
                    Address = (void*)&LanguageString;
                    Size    = pgm_read_byte(&LanguageString.Header.Size);
                case 0x01:
                    Address = (void*)&ManufacturerString;
                    Size    = pgm_read_byte(&ManufacturerString.Header.Size);
                case 0x02:
                    Address = (void*)&ProductString;
                    Size    = pgm_read_byte(&ProductString.Header.Size);

    *DescriptorAddress = Address;
    return Size;

With this, and calling USB_Init() when your device starts up, you should have a gadget that is properly enumerated by the matching driver.  Once a configuration is selected we need to configure the endpoints, but after that we can finally send and receive data!

void EVENT_USB_Device_ConfigurationChanged(void)
    int i,succ=1;

    succ &= Endpoint_ConfigureEndpoint(BULK_IN_EPNUM, EP_TYPE_BULK, ENDPOINT_DIR_IN,
                                       BULK_IN_EPSIZE, ENDPOINT_BANK_SINGLE);

    succ &= Endpoint_ConfigureEndpoint(BULK_OUT_EPNUM, EP_TYPE_BULK, ENDPOINT_DIR_OUT,
                                       BULK_OUT_EPSIZE, ENDPOINT_BANK_SINGLE);

    succ &= Endpoint_ConfigureEndpoint(COMMAND_EPNUM, EP_TYPE_BULK,
                                       ENDPOINT_DIR_OUT, COMMAND_EPSIZE,
    while(!succ) {
    for(i = 0; i < 5; i ++) {

Now to handle any data on the endpoints, you’ll probably want to tight loop around code that checks each endpoint, reads data, and performs some action:

    if (USB_DeviceState != DEVICE_STATE_Configured)

    if (Endpoint_IsConfigured() && Endpoint_IsINReady() && Endpoint_IsReadWriteAllowed()) {
        do_something(state, data, &len);
        err = Endpoint_Write_Stream_LE((void *)data, len);
        // FIXME handle err

    if (Endpoint_IsConfigured() && Endpoint_IsOUTReceived() && Endpoint_IsReadWriteAllowed()) {
       err = Endpoint_Read_Stream_LE(data, len);
       do_other_thing(data, len);

    if (Endpoint_IsConfigured() && Endpoint_IsOUTReceived() && Endpoint_IsReadWriteAllowed()) {
        usb_cmd_t cmd;
        err = Endpoint_Read_Stream_LE(&cmd, sizeof(usb_cmd_t));

And that’s the whole deal. Some things to take note of:

1) I don’t know why I needed Endpoint_ResetFIFO() – my results lagged by one message without them but I feel certain this is not the correct way to fix the issue. Perhaps I need to reset the endpoints some time after configuration and before use, but I’m not sure which callback to use for that, if that’s the case.

2) C needs phantom types.  Many of these functions aren’t intended for use with CONTROL endpoints.  See the LUFA documentation for which functions to use if you have non-BULK endpoints.

3) Be sure to number your ENDPOINTS starting at 1 (the *_EPNUM values).  I’ve seen unusual behavior when numbering started higher – not sure if that was against standards or not.


At a minimum, you must declare a “USB_Descriptor_Device_t” variable, three “USB_Descriptor_String_t” variables (Language, Manufacturer, Product), a custom “USB_Descriptor_Configuration_t” structure and variable.  Also declare a function named “CALLBACK_USB_GetDescriptor” and one called “EVENT_USB_Device_ConfigurationChanged(void)” which calls
“Endpoint_ConfigureEndpoint(*_EPNUM, EP_TYPE_{BULK, INT, CONTROL, ISOC}, ENDPOINT_DIR_{IN,OUT}, *_EPSIZE, ENDPOINT_BANK_{DOUBLE,SINGLE});” for each endpoint in the configuration. Finally use “Endpoint_SelectEndpoint(*_EPNUM)” followed by “Endpoint_Is{INReady,OutReceived}() && Endpoint_IsReadWriteAllowed()” to select an endpoint and check for data. Finish all this by reading your stream “Endpoint_Read_Stream_LE(buf, len)” and issue a call to “Endpoint_Clear{IN,OUT}()” when done.


Kernel Modules in Haskell

September 13, 2009

If you love Haskell and Linux then today is your day – today we reconcile the two and allow you to write Linux Kernel modules in Haskell. By making GHC and the Linux build system meet in the middle we can have modules that are type safe and garbage collected. Using the copy of GHC modified for the House operating system as a base, it turns out to be relatively simple to make the modifications necessary to generate object files for the Kernel environment. Additionally, a new calling convention (regparm3) was added to make it easier to import (and export) functions from the Kernel.

EDIT: I’m getting tired of updating this page with extremely minor changes and having it jump to the top of, so for the latest take a look at the haskell wiki entry. It includes discussion of the starting environment – a pretty large omission from the original post.

Starting Environment

You need a Linux x86 (not AMD64) based distribution with GCC 4.4 or higher and recent versions of gnu make and patch along with many other common developer tools usually found in a binutils package (e.g. ar, ld). You also should have the necessary tools to build GHC 6.8.2 – this includes a copy of ghc-6.8.x, alex, and happy.

Building GHC to Make Object files for Linux Modules

Start by downloading the House 0.8.93 [1]. Use the build system to acquire ghc-6.8.2 and apply the House patches. The House patches allow GHC to compile Haskell binaries that will run on bare metal (x86) without an underlying operating system, so this makes a good starting point.

	> wget
	> tar xjf House-0.8.93.tar.bz2
	> cd House-0.8.93
	> make boot
	> make stamp-patch

Now acquire the extra patch which changes the RTS to use the proper Kernel calls, instead of allocating its own memory, and to respect the current interrupt level. This patch also changes the build options to avoid common area blocks for uninitilized data (-fno-common) and use frame pointers (-fno-omit-frame-pointers).

	> wget
	> patch -d ghc-6.8.2 -p1 < hghc.patch
        > make stamp-configure
	> make stamp-ghc							# makes ghc stage 1

Next, build a custom libgmp with the -fno-common flag set. This library is needed for the Integer support in Haskell.

	> wget
	> tar xjf gmp-4.3.1.tar.bz2
	> cd gmp-4.3.1
	> ./configure
	# edit 'Makefile' and add '-fno-common' to the end of the 'CFLAGS = ' line.
	> make
	> cp .libs/libgmp.a $HOUSE_DIR/support

Apply ‘support.patch’ to alter the build systems of the libtiny_{c,gcc,gmp}.a and build the libraries.

	> wget
	> patch -p0 -d $HOUSE_DIR < support.patch
	> make -C $HOUSE_DIR/support

Build the cbits object files:

	> make -C $HOUSE_DIR/kernel cobjs

In preparation for the final linking, which is done manually, pick a working directory ($WDIR) that will serve to hold the needed libraries. Make some last minute modifications to the archives and copy libHSrts.a, libcbits.a, libtiny_gmp.a, libtiny_c.a, and libtiny_gcc.a

	> mkdir $WDIR
	> ar d support/libtiny_c.a dlmalloc.o		# dlmalloc assumes it manages all memory
	> ar q $WDIR/libcbits.a kernel/cbits/*.o
	> cp ghc-6.8.2/rts/libHSrts.a support/libtiny_{c,gcc,gmp}.a $WDIR

Build a Kernel Module

First, write the C shim that will be read by the Linux build system. While it might be possible to avoid C entirely its easier to use the build system, and its plethora of macros, than fight it. The basic components of the shim are a license declaration, function prototypes for the imported (Haskell) functions, initialization, and exit functions. All these can be seen in the example hello.c [2].

Notice that many of the standard C functions in the GHC RTS were not changed by our patches. To allow the RTS to perform key actions, such as malloc and free, the hello.c file includes shim functions such as ‘malloc’ which simply calls ‘kmalloc’. Any derivative module you make should include these functions either in the main C file or a supporting one.

Second, write a Haskell module and export the initialization and exit function so the C module may call them. Feel free to import kernel functions, just be sure to use the ‘regparm3’ key word in place of ‘ccall’ or ‘stdcall’. For example:

	foreign import regparm3 unsafe foo :: CString -> IO CInt
	foreign export regparm3 hello :: IO CInt

Continuing the example started by hello.c, ‘hsHello.hs’ is online [3].

Now start building the object files. Starting with building hsHello.o, you must execute:

        > $HOUSE_DIR/ghc-6.8.2/compiler/stage1/ghc-inplace -B$HOUSE_DIR/ghc-6.8.2  hsHello.hs -c

* note that this step will generate or overwrite any hsHello_stub.c file.

When GHC generates C code for exported functions there is an implicit assumption that the program will be compiled by GHC. As a result the nursery and most the RTS systems are not initialized so the proper function calls must be added to hsHello_stub.c.

Add the funcion call “startupHaskell(0, NULL, NULL);” before rts_lock() in the initializing Haskell function. Similarly, add a call to “hs_exit_nowait()” after rts_unlock().

The stub may now be compiled, producing hsHello_stub.o. This is done below via hghc, which is an alias for our version of ghc with many flags [4].

	> hghc hsHello_stub.c -c

The remaining object files, hello.o and module_name.mod.o, can be created by the Linux build system. The necessary make file should contain the following:

	obj-m := module.o		# Obviously you should name the module as you see fit
	module-objs := hello.o

And the make command (assuming the kernel source is in /usr/src/kernels/):

	> make -C /usr/src/kernels/ M=`pwd` modules

This should make “hello.o” and “module.mod.o”. Everything can now be linked together with a single ld command.

	> ld -r -m elf_i386 -o module.ko hsHello_stub.o hsHello.o module.mod.o hello.o *.a libcbits.a

A successful build should not have any common block variables and the only undefined symbols should be provided by the kernel, meaning you should recognize said functions. As a result, the first command below should not result in output while the second should be minimal.

	> nm module.ko | egrep “^ +C ”
	> nm module.ko | egrep “^ +U ”
 	        U __kmalloc
 	        U kfree
 	        U krealloc
 	        U mcount
 	        U printk

Known Issue

The House-GHC 6.8.2 run-time system (RTS) does not clean up the allocated memory on shutdown, so adding and removing kernel modules results in large memory leaks which can eventually crash the system. This should be relatively easy to fix, but little investigation was done.

A good estimate of the memory leak is the number of megabytes in the heap (probably 1MB, unless your module needs lots of memory) plus 14 bytes of randomly leaked memory from two unidentified (6 and 8 byte) allocations.

[4] $HOUSE_DIR/ghc-6.8.2/compiler/stage1/ghc-inplace -B$HOUSE_DIR/ghc-6.8.2 -optc-fno-common -optc-Wa,symbolic -optc-static-libgcc -optc-nostdlib -optc-I/usr/src/kernels/ -optc-MD -optc-mno-sse -optc-mno-mmx -optc-mno-sse2 -optc-mno-3dnow -optc-Wframe-larger-than=1024 -optc-fno-stack-protector -optc-fno-optimize-sibling-calls -optc-g -optc-fno-dwarf2-cfi-asm -optc-Wno-pointer-sign -optc-fwrapv -optc-fno-strict-aliasing -I/usr/src/kernels/ -optc-mpreferred-stack-boundary=2 -optc-march=i586 -optc-Wa,-mtune=generic32 -optc-ffreestanding -optc-mtune=generic -optc-fno-asynchronous-unwind-tables -optc-pg -optc-fno-omit-frame-pointer -fvia-c