Global Announcement, Haskell

Setdown: the best tool for fast and repeatable line based set operations

Introducing setdown

Have you ever been on the command line and tried to perform set operations? Have you ever followed crazy cli guides on the internet that suggest complicated commands to try and perform set operations on files. I have. And I did not like it; I think that we can do better.

Over the weekend I wrote a pretty nifty program that I am calling: Setdown. Setdown requires you to specify the set operations that you wish to perform as a definitions in a set definitions file (often suffixed with ‘.setdown’). The setdown language definitions are written in a very similar format to Makefiles; except that it performs set operations.

If you want to install setdown right now or checkout the code then you can follow these links:

If you want to learn how to use setdown and write set operations in it’s language then you should read the README file provided in the source code. However, to show you how easy the language is to read you I have provided an example .setdown file right here:

-- All of the letters of the alphabet
alphabet: "alphabet.txt.unsorted"

-- Calculating the consonants with a set difference
consonants: alphabet - "vowels.txt.unsorted"

-- Getting any letter than is e-sounding or a vowel
e-or-vowels: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

-- Get any letter that is e-sounding and a vowel
e-and-vowel: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

-- Get all of the e-sounding letters, the vowels and the consonants
e-or-vowels-or-consonants: ("e-letters.txt.unsorted" / "vowels.txt.unsorted") / consonants

You should install setdown and then check out the setdown-examples project to give it a try right now!

Can you show me an example?

By this point in time you are probably wondering “I love the look of it but show me an example”. So I will. Here is the output of a full running example by checking out the first example in the setdown-examples repository and running it:

$ setdown ex1.setdown 
==> Creating the environment...
Base Directory: ./
Output Directory: ./output

==> Parsed original definitions...
e-or-vowels-or-consonants: ("e-letters.txt.unsorted" / "vowels.txt.unsorted") / consonants

e-and-vowel: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

e-or-vowels: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

consonants: alphabet - "vowels.txt.unsorted"

alphabet: "alphabet.txt.unsorted"

==> Verification (Ensuring correctness in the set definitions file)
OK: No duplicate definitions found.
OK: No unknown identifiers found.
OK: All files in the definitions could be found.

==> Simplifying and eliminating duplicates from set definitions...DONE:
alphabet: "alphabet.txt.unsorted"

consonants: alphabet - "vowels.txt.unsorted"

e-and-vowel: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

e-or-vowels: "e-letters.txt.unsorted" / "vowels.txt.unsorted"

e-or-vowels-or-consonants: e-or-vowels / consonants

==> Checking for cycles in the simplified definitions...DONE:
OK: No cycles were found in the definitions.

==> Copying and Sorting all input files from the definitions...
"alphabet.txt.unsorted" (unsorted) => "./output/alphabet.txt.unsorted.1.split.sorted" (sorted)
"e-letters.txt.unsorted" (unsorted) => "./output/e-letters.txt.unsorted.1.split.sorted" (sorted)
"vowels.txt.unsorted" (unsorted) => "./output/vowels.txt.unsorted.1.split.sorted" (sorted)

==> Computing set operations between the files...
Required results:
alphabet: ./output/alphabet.txt.unsorted.1.split.sorted

consonants: ./output/c989d1cf-b860-41cc-a52c-e2afc1e6a235

e-and-vowel: ./output/a8bd5974-22d5-4fdb-b269-0c09a1eeeb18

e-or-vowels: ./output/c3a8cc7c-f246-4eb4-b321-57f900964960

e-or-vowels-or-consonants: ./output/493ca813-7e3c-4259-9435-e2d5ddb4d6a5
$

As you can see we have ended up with a number of output files. Just to pick one example lets see the contents of the consonants file:

$ cat ./output/c989d1cf-b860-41cc-a52c-e2afc1e6a235
b
c
d
f
g
h
j
k
l
m
n
p
q
r
s
t
v
w
x
y
z
$

And look at that, we have computed the consonants when we were given the vowels and the rest of the alphabet. Hopefully you can see that this is very powerful and will let you write increasingly more correct set operations from the command line.

The benefits of setdown

Depending on your command line bent you may have used other tools in the past to perform set operations on files, like comm or fgrep, but these tools are quite lacking. Instead let me show you the full range of features that setdown gives you:

  • Maintainability
    If you get more set data to add to your collection (as often happens) then it is trivial to edit the setdown definitions to include it.
  • Repeatability
    Even if the data changes you run one single command and all of your set operations are performed again.
  • Sorted input is not required!
    Programs like comm require that you have sorted input if you want to do efficient set operations on files. This make sense because sorted files make set operations very efficient. However, we don’t put the onus on you to provide us with sorted input. Setdown will sort any files that you give it itself. We even use External Sort so that you can give us truly massive files and expect that we will still be able to perform your set operations.
  • Simplification of definitions
    If you write the same definition twice then setdown will factor that out and only perform the set operation once. This makes setdown run as efficiently as possible:

    ==> Parsed original definitions...
    C: "b-1.out" - ("a-1.out" / "a-2.out")
    B: "a-1.out" / "a-2.out"
    A: ("a-1.out" / "a-2.out") / "b-1.out"
    
    ==> Verification (Ensuring correctness in the set definitions file)
    OK: No duplicate definitions found.
    OK: No unknown identifiers found.
    OK: All files in the definitions could be found.
    
    ==> Simplifying and eliminating duplicates from set definitions...DONE:
    A: "b-1.out" / B
    B: "a-1.out" / "a-2.out"
    C: "b-1.out" - B
  • Dependencies and cyclic dependency detection
    Since you can write set definitions that depend on other set definitions it is possible to write a cyclic dependency. We will spot this for you and also tell you exactly where the cycle is in your file, meaning that you don’t have to search for it yourself!

    ==> Simplifying and eliminating duplicates from set definitions...DONE:
    A: C
    B: D
    C: B
    D: A
    
    ==> Checking for cycles in the simplified definitions...DONE:
    [Error 20] found cyclic dependencies in the definitions!
    We found the following cycles:
       A -> C -> B -> D -> A
  • Validation
    We verify that your set description only references files that exist and that if you reference dependencies that do not exist then you will get an error.
  • Works nicely with version control
    You check your .setdown file and your input files into the repository and share them with your co-workers. Everybody can use setdown to get the same results! To prove it, I have written three examples in a setdown-examples repository. Check it out and give it a try!
  • Written in Haskell
    This makes the program very fast and efficient while, in my opinion, reducing the chances of having bugs.

I think that this is a much more compelling set operations tool for the command line than anything else that exists out there and I am really happy to share it with you today for free. I also really hope that you get some great usage out of this tool and that it makes your life easier.

Concluding words

From experience I can say, without this tool, dealing with complicated set operations on the command line and sharing your results with your co-workers is much more difficult than it should be.

At any rate I hope that you get a great deal of value from this tool and if you have any comments or suggestions then please ask them here on this blog or raise them as issues. If you have any questions then ask them here or on Stack Overflow.

Thanks for reading and I hope this makes somebodies like a little bit easier.

Haskell

Parsing WAVE files correctly in Haskell.

Wavy over WAVE

Playing with audio data is a ton of fun and something that I believe that Haskell could do very well. Processing audio data safely and efficiently seems to fit very well into Haskell’s model so, overy a year ago, I started working on and off on a WAVE file format parser. I have been working on it very infrequently (bus trips and other spare time) and I rewrote it once but today I am pleased to announce the release of the very first version of my ‘wavy’ package that lets you extract data from WAVE files in Haskell. The features of this release include:

  • Methods to Parse and Assemble Wave Files
  • Support for different orderings of RIFF chunks (via the riff library I wrote previously)
  • A split between the parsers for the container format and the data allowing efficient metadata parsing.
  • The ability to pase the data into Int64 or Float formats so that you can handle the data in whichever way that you please.
  • Example programs that make use of the library for your perusal and use.

Things which the library is currently missing include:

  • RIFX support
  • Direct support for maintaining RIFF chunks that are not mentioned in the WAVE specification.

Getting the Code

The code is on hackage as the wavy library so you can install it by:

cabal update
cabal install wavy

Please feel free to give it a try. Probably the best way to quickly see it working is by finding a WAVE file on your machine:

locate '*.wav'

And then passing that wave file as the first argument into wave-info. For example:

$ wave-info /Applications/Steam.app/Contents/MacOS/Friends/friend_join.wav
File: /Applications/Steam.app/Contents/MacOS/Friends/friend_join.wav - 2s
   Format
     Audio Format:  Microsoft PCM
     Channels:      2
     Sample Rate:   44100
     Bits Per Sample:  16
     Byte Rate:  176400
     Block Alignment:  4

   INFO Metadata
     Creation Date: 2010-03-11
     Engineers:       - Kelly Thornton
     Creation Software: Sony Sound Forge 9.0
$

As you can see it can parse an audio file very quickly. The wave-info program is very efficient because the library is lazy and does not parse the data chunk unless you specifically require it.

Using the Library

I would recommend that you start by looking at the executables in the libraries source code for examples of how this library can be used in your applications. The wave-info source code can be found on BitBucket.

Once you have finished doing that then you can have a read throug the documentation on Hackage to get a full understanding of what methods the library provides.

If you have any questions then please do not hesitate to contact me or comment on the blog.

(This blog post was produced using pandoc)

Haskell

Tutorial: The basics of cipher-aes

Cipher AES Tutorial

In this guide we are going to have a quick run through the cipher-aes library to show how it can be used in your own projects. We will just stick to simple ECB encryption and leave other encodings for a future guide.

Note: In order to follow along with this guide you will need a working knowledge of Haskell (Understanding the first few chapters of Real World Haskell should do the trick)

Import all the Things

But first lets get the imports out of the way:

module Main where

import qualified Crypto.Cipher.AES as CCA
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC
import qualified Data.ByteString.Base16 as B16
import Data.Char (chr)

You will notice that I like to qualify everything to give you a better idea of which libraries these functions are coming from.

How much seed data do we need?

Whenever you wish to begin the encryption you need to provide a seed. You can see that in the type of the initAES method:

initAES :: Byteable b => b -> AES

For the sake of this guide we are going to use a 32 byte hash in order to initialise the AES cipher. This library supports starting with 16, 24 and 32 byte hashes. In most programs that you write this initial seed is going to be called a "secret key" or "shared key"; it will often be stuck in a configuration file in a secure location and it should certainly be kept out of the source code. So developers often like specifying this secret key as a big hexadecimal String. However, even though you could read this string straight into a ByteString that would not be correct because each hexidecimal character only has 16 unique points of data (4-bits).

So the quesiton is: how many hexidecimal characters do we need in order to get 32 bytes of random seed data?

Lets just use Haskell to do it for us. We know that there are four bits of data in each hexidecimal character:

bitsPerHexChar :: Integer
bitsPerHexChar = 4

There are 8 bits per byte:

bitsPerByte :: Integer
bitsPerByte = 8

We want 32 bytes of seed data:

requiredSeedBytes :: Integer
requiredSeedBytes = 32

With that information we can work out how many hexidecimal characters are required:

requiredHexChars :: Integer
requiredHexChars = (requiredSeedBytes * bitsPerByte) `div` bitsPerHexChar

Now this only works because either requiredSeedBytes or bitsPerByte is divisible by 4. In our case they both are.

Loading the seed from Hexidecimal input

Now that we know that we need to load requiredHexChars many characters in order to initialise the AES encryption. However this is going to be given to us in this form:

type HexStream = B.ByteString

Where every byte in the stream has 4 bits of unique data. However, the initAES function expects the data to be a little more compressed than that. It expects 8 bits of unique data per per byte to make a true seed for the AES algorithm:

type InputSeed = B.ByteString

And before we even try to convert from one type to the other we should validate that our inputs are correctly formated Hexidecimal strings (validate your inputs):

validHex :: HexStream -> Either String HexStream
validHex input = if inputLength == requiredHexChars
   then Right input
   else Left $ "Expected " ++ show requiredHexChars ++ " hex characters but instead recieved " ++ show inputLength
   where
      inputLength :: Integer
      inputLength = fromIntegral . B.length $ input

Then we will want a function that converts from one to the other and squishes two 4 bit chunks together to form an 8 bit number. We can do that like so by using the base16-bytestring library:

toSeed :: HexStream -> Either String InputSeed
toSeed input = if B.null errors 
   then Right seedData 
   else Left "The input data was not made up of Hexidecimal characters."
   where
      (seedData, errors) = B16.decode input

We can quickly join the two to have a useful function for loading seed data from hex streams:

toValidSeed :: HexStream -> Either String InputSeed
toValidSeed input = toSeed =<< validHex input

In order to actually make use of these methods we need to have a hexidecimal string to start off with as our input. Here is one that I generated for this guide (you will need to generate another truly random one for your production applications):

chosenHexStream :: HexStream
chosenHexStream = BC.pack "e5d6834e0e52a78a47fc1c8887ca0e0ecd0863df89e6a3eebf7085bd131bb854"

We can then use this seed to create an InputSeed but it might not parse so we want to encode that in the types:

potentialInputSeed :: Either String InputSeed
potentialInputSeed = toValidSeed chosenHexStream

Using the AES library

Now that we have our input seed we can initialise an AES encryption context:

aesEnc :: Either String CCA.AES
aesEnc = fmap CCA.initAES potentialInputSeed

And with it we can start to do some encryption! Woo! However, we need some data to try and encrypt…hmmm. Lets make some random test strings:

testData0 :: B.ByteString
testData0 = BC.pack $ "It might seem crazy what I’m about to say" 
   ++ "Sunshine she’s here, you can take a break"
   ++ "I’m a hot air balloon that could go to space"
   ++ "With the air, like I don’t care baby by the way"

testData1 :: B.ByteString
testData1 = BC.pack "B-b-b-baby, you just ain't seen n-n-nothin' yet Here's something that you never gonna forget"

testData2 :: B.ByteString
testData2 = BC.pack $ "There's a calm surrender to the rush of day" 
   ++ "When the heat of a rolling wind can be turned away"

So, with this test data we can now try and run some encryption. Lets look at the most basic encryption method:

encryptECB :: AES -> ByteString -> ByteString

Okay, that type seems pretty self explanatory, give me an AES context and the thing that you want to encrypt and I’ll run some ECB encryption over it and give the result to you in a Strict ByteString. So you would think that we could just do something like this:

broken :: Either String B.ByteString
broken = fmap (flip CCA.encryptECB testData0) aesEnc

After all, it even compiles! But, surprisingly, that does not work. Instead you get the following error message:

Encryption error: input length must be a multiple of block size (16). Its length is: 173

As you can see the cipher-aes library apparently requires that all data be aligned to a block size of 16 bytes. I know that the correct units are bytes based on the fact that testData0 is 173 units long. To solve this problem we need to make sure that we are always giving the encryptECB function a bytestring that always meets that boundary. The most sensible way I can think to do that is to use a zero padded bytestring. So lets try and build a funciton that will do that for us:

type PaddedByteString = B.ByteString

zeroPadData :: B.ByteString -> PaddedByteString
zeroPadData input = input `B.append` padding
   where
      padding = BC.replicate requiredPadding (chr 0)
      requiredPadding = case inputLength `mod` 16 of
         0 -> 0
         x -> 16 - x
      inputLength = B.length input

Now with this new zeroPadData function we can build an encryption function that will always work:

safeEncryptECB :: CCA.AES -> B.ByteString -> B.ByteString
safeEncryptECB enc input = CCA.encryptECB enc (zeroPadData input)

With this new safe encryption function we can encrypt all of our test data. However, before we do that lets first try and show what happens when you go in the other direction.

Decrypting your AES data

Now that we have function that can encrypt our data we really want a function that can go in the other direction and decrypt it. Lets take a look at the decryptECB function from the cipher-aes library:

decryptECB :: AES -> ByteString -> ByteString

Once again this one is pretty simple, given an AES content and an encoded string of data it will decode it back into the original format. So lets try and write a function that will do the reverse of the operation that we did in the safeEncryptECB function. It is important to note that the data also needs to come back in 16 byte blocks otherwise it cannot be decoded. We can ensure that and use types to handle the errors instead of the error command:

safeDecryptECB :: CCA.AES -> B.ByteString -> Either String B.ByteString
safeDecryptECB enc encodedData = if alignment /= 0
   then Left $ "Error: encoded data should have been 16 byte aligned but was off by: " ++ show alignment
   else Right . fst . BC.spanEnd (== (chr 0)) $ CCA.decryptECB enc encodedData
   where
      alignment = B.length encodedData `mod` 16

This function handles encrypting and decrypting the data safely. Which is especially important because often the encrypted data has the potential to be modified by other systems before coming back to us.

However, it is really important to note that the safeEncryptECB and safeDecryptECB functions are not inversions of eachother. Specifically, if you have an input string that legitimately has trailing nul characters and you encrypt it and then decrypt it then those characters will be stripped in the final output. That is something to be wary of. However, the set of strings that do not end in nul characters will be invertable by these functions.

Now that we have all of this we can really bring it all together.

Bringing it all together

Now that we have put in all of that effort we can really bring it all together with a function that will encrypt the data, show it to us encrypted, and then decrypt it again. Lets give that a try:

-- A hobbits tale...
thereAndBackAgain :: CCA.AES -> B.ByteString -> IO ()
thereAndBackAgain enc input = do
   putStrLn $ "Data is: " ++ show input
   let encData = safeEncryptECB enc input
   putStrLn $ "Encrypted data: " ++ show encData
   case safeDecryptECB enc encData of
      Left error -> putStrLn error
      Right originalData -> putStrLn $ "Original data was: " ++ show originalData
   newline

newline = putStrLn ""

We can run that little snippet of code on our test data and watch as it encrypts and decrypts our data. But we also want to make sure that invalid data is handled correctly too, so why don’t we also run something that actually modifies the data before trying to decrypt it again and see what happens:

showErrorsHappening :: CCA.AES -> IO ()
showErrorsHappening enc = do
   let testData0Enc = safeEncryptECB enc testData0
   print $ safeDecryptECB enc (testData0Enc `B.append` (BC.pack "modified"))

And now that we have done that all that is left is to actually run the code:

main = do
   putStrLn "Welcome to the cipher-aes guide by Robert Massaioli."
   putStrLn $ "We calculated that we need " ++ show requiredHexChars ++ " hexidecimal characters in the seed in order to initialise the AES cipher."
   case aesEnc of
      Left error -> putStrLn $ "Error parsing chosenHexStream: " ++ error
      Right aes -> do
         putStrLn "Parsed successfully."
         newline
         thereAndBackAgain aes testData0
         thereAndBackAgain aes testData1
         thereAndBackAgain aes testData2
         putStrLn "Showing an error happening by data that is too long:"
         showErrorsHappening aes
         putStrLn "All tests ran successfully"

This guide is actually a literate Haskell source file so you can run it and watch the results printed on the screen. Please check out the repository on BitBucket for this guide to give it a try. If you have any questions then please post them below and thankyou for reading!

(Note: This guide was designed to be converted into HTML for use in WordPress via pandoc.)

Haskell, Interesting

Read and write a RIFF (or RIFX)

What is RIFF anyway?

The RIFF file format is an old file format that is used as a container format for WAVE files among other things. Recently I decided that I wanted to write some pure Haskell code that could parse this file format so that I can start working my way towards building audio libraries in pure Haskell.

So you wrote a Haskell RIFF package did you?

Yes. You can view the results of my efforts on hackage: The riff package. You can even view the code on BitBucket if you like.

That package contains:

  • The riff library with the following features:
    • The ability to parse both RIFF and RIFX files. (Only perfectly formatted RIFF files are currently supported, we currently have no best effort support)
    • Convenience methods to make parsing / assembling RIFF files easier.
    • Written in pure Haskell so that you can run your code everywhere and be assured by all of the nice type safety that Haskell gives you.
  • A riff-structure executable that will print out the structure of all of the riff files that you provide it with.
  • A riff-convert executable that will let you convert RIFX files into RIFF files and vice versa.
  • A riff-identity executable that is pretty useless for practical purposes (it just makes a clone of the RIFF file you give it) but great for testing the library and it serves as a good code example.
  • Complete documentation coverage so that you know how to use each and every method in the library and what the limitations are.

You can give it a try today to read some RIFF files and it is all pretty self explanitory. I hope somebody gets some good use out of this. I am going to try and keep this library small and focused; please feel free to contribute and let me know what you think. And if you use it for something then especially let me know. It would make me very happy.

Haskell

Connecting Haskell HUnit tests to Cabal TestSuite

This is a quick guide that should explain how to connect Haskell HUnit tests to the Cabal TestSuite. I’m going to explain the simplest possible method in which you could do so and then leave it up to you to take the method further. For this post I am going to be using the following versions of Haskell software:

We are going to go through this guide in the following order:

  • Understand the data structures in Cabal TestSuite and their expected use cases.
  • Understand the data structures in Test.HUnit
  • Walk through a simple way to combine the two.

After that is done then you should be able mesh the two together easily. If you already feel that you understand the first two sections then feel free to skip straight to the final section to see how the two are joined together.

Understanding the structure of Cabal TestSuite

I am going to attempt to connect them together for the sake of running all of my test cases easily using Cabal. It also means that other people reading my project for the first time will be able to follow conventions and run my test cases with ease. The first thing that you have to understand is the structure of the Cabal TestSuite. Here is the source code for Distribution.TestSuite:

data Test
    = Test TestInstance
    | Group
        { groupName     :: String
        , concurrently  :: Bool
            -- ^ If true, then children of this group may be run in parallel.
            -- Note that this setting is not inherited by children. In
            -- particular, consider a group F with "concurrently = False" that
            -- has some children, including a group T with "concurrently =
            -- True". The children of group T may be run concurrently with each
            -- other, as long as none are run at the same time as any of the
            -- direct children of group F.
        , groupTests    :: [Test]
        }
    | ExtraOptions [OptionDescr] Test

You will notice that this data structure has three constructors. The first constructor Test expects a single test instance, the Group constructor gathers a number of TestSuite test cases under the same name and also, interestingly, gives us the option to run the test cases concurrently. The final constructor ExtraOptions lets us pass extra options to test cases. For the sake of making the integration very simple so that you can get up and running in minutes we will be ignoring everything except the Test constructor. Now you may have noticed that the Test constructor expects a TestInstance but what does that even look like. Here is the code that makes up a TestInstance:

data TestInstance = TestInstance
    { run       :: IO Progress      -- ^ Perform the test.
    , name      :: String           -- ^ A name for the test, unique within a
                                    -- test suite.
    , tags      :: [String]         -- ^ Users can select groups of tests by
                                    -- their tags.
    , options   :: [OptionDescr]    -- ^ Descriptions of the options recognized
                                    -- by this test.
    , setOption :: String -> String -> Either String TestInstance
        -- ^ Try to set the named option to the given value. Returns an error
        -- message if the option is not supported or the value could not be
        -- correctly parsed; otherwise, a 'TestInstance' with the option set to
        -- the given value is returned.
    }

So a TestInstance expects to be able to run something that will return a progress, it expects to have a name, a list of zero or more tags, a list of zero or more options and a method that allows you to set new options and get back a modified test. The name is pretty self explanatory, it lets you name the test. The list of tags to selectively run different test cases is extremely useful and large companies that choose to write a ton of test cases and have them run in automated builds (like in Bamboo) will use this feature a massive amount to be very selective in the test cases that they run. The options are partially useful too but we will be ignoring them for the sake of speed of development. Out of all of these fields the run is perhaps the most interesting. Lets take a look at the Progress data structure that is supposed to be the result of the IO action:

data Progress = Finished Result
              | Progress String (IO Progress)

data Result = Pass
            | Fail String
            | Error String
  deriving (Eq, Read, Show)

Now the Progress data structure is actually quite interesting. It has one constructor that tells us that a test case has Finished and that the Result of the test was a Pass, Fail or Error. That is the easy part to understand, the next constructor seems to say that the progress of this test case is that it is still in Progress; it also provides a message and gives an IO action to continue the progress. This is a way of reporting progress of the test cases before the test case finishes; this would be really useful if you have some very long running test cases and you wanted to get them to give you messages sooner rather than later. For the sake of our simple test cases we will only be using the Finished constructor and the Result data structure to show success.

Now you understand how the Cabal Distribution expects Test cases to be structured. You have Test cases that have TestInstances. The TestInstances are capable of being run and reporting their Progress. Eventually the Progress will be Finished and you will be able to get a result which will be reported back to you on the screen. Now we have to learn the structure of the HUnit module so that we can figure out how to put the two together.

Understanding the structure of the HUnit Module

The Test.HUnit module provides some similar structures to write HUnit test cases. Specifically it has a Test data structure that looks like this:

-- | The basic structure used to create an annotated tree of test cases.
data Test
    -- | A single, independent test case composed.
    = TestCase Assertion
    -- | A set of @Test@s sharing the same level in the hierarchy.
    | TestList [Test]
    -- | A name or description for a subtree of the @Test@s.
    | TestLabel String Test

As you can see we have a tree based data structure with a TestCase being the smallest node that can be defined. Each test case asserts something. You can also have a list of tests that make up one larger test and you can use labels to give names to other tests or groups of tests. I’m going to just assume that you know how to write HUnit test cases (as that is not the point of this document) but how do you run a test case?

You could use the performTest function but it is quite low level and would take to long to wire up. So instead we can use the runTestTT function to do the job for us with a much simpler return type:

-- | Provides the "standard" text-based test controller. Reporting is made to
--   standard error, and progress reports are included. For possible
--   programmatic use, the final counts are returned.
--
--   The "TT" in the name suggests "Text-based reporting to the Terminal".

runTestTT :: Test -> IO Counts
runTestTT t = do (counts', 0)                  return counts'

As you can see we just give this function our test cases and it gives us the results in the form of a Counts object that looks like this:

-- | A data structure that hold the results of tests that have been performed
-- up until this point.
data Counts = Counts { cases, tried, errors, failures :: Int }
  deriving (Eq, Show, Read)

From this Counts object we can derive the number of test cases that we tried to run as well as any failures or errors that occurred. We will use that in the next section to join the two together.

Combining HUnit and Cabal TestSuite together

Now that we know how it all works we can, quite easily, join the two together. Now because both Cabal TestSuite and HUnit define a Test data structure we need to have qualified imports and that makes the code a little harder to read but I am sure that you will see straight through it. Therefore I am just going to show you the final results and then just highlight the important sections:

module Test where

import qualified Distribution.TestSuite as TS
import qualified Test.HUnit as HU

test1 = HU.TestCase (HU.assertEqual "one equals three" 1 3)

hunitTests = HU.TestList [HU.TestLabel "Test 1" test1]

runHUnitTests :: HU.Test -> IO TS.Progress
runHUnitTests tests = do
   (HU.Counts cases tried errors failures) <- HU.runTestTT tests return $ if errors > 0
      then TS.Finished $ TS.Error "There were errors in the HUnit tests"
      else if failures > 0
         then TS.Finished $ TS.Fail "There were failures in the HUnit tests"
         else TS.Finished TS.Pass

tests :: IO [TS.Test]
tests = return [ TS.Test hunit ]
  where
    hunit = TS.TestInstance
        { TS.run = runHUnitTests hunitTests
        , TS.name = "HUnit Test Cases"
        , TS.tags = ["hunit"]
        , TS.options = []
        , TS.setOption = _ _ -> Right hunit
        }

Breaking down the code segment above:

  • The first two highlights are lines 3 and 4 which show us importing the TestSuite as TS and the HUnit test classes as HU. You’ll have to remember that for the remainder of the tests.
  • The next highlight is the definition of the hunitTests which are HUnit Test classes. You can provide this wherever you want and, to make your code nicer to read, you should probably split this off into a separate test module and then import it back in.
  • The next highlight shows the function that does the majority of the work. It runs the test cases using runTestTT and then inspects the result counts to figure out what Result type to return back to the Cabal TestSuite.
  • In the next highlight you can see that we are using the runHUnitTests function to provide a runner for the TestInstance. At this point in time we have fully connected our HUnit test cases and our Cabal TestSuite and you should be able to “cabal install –enable-tests” and watch your HUnit test cases run.

That is all for this guide and I hope that it help you get your test cases up and running quickly.

Further reading: This is obviously a very simple integration of the two and you have plenty of room to improve that integration dramatically. However, I have spotted a library that seems to be in development (and not on Hackage) called cabal-test-hunit that you might want to look at for more information.

Haskell

A line based file splitter for the command line.

Have you ever wanted to extract only a certain set of lines from a file? Maybe you wanted to get everything from line 400 onwards, or just lines 25 to 50? Well I did. I call the end result ‘splitter’.

Splitter is a program designed to be used on the command line and it has been written entirely in Haskell. I have uploaded Splitter so that it is available on Hackage. You can find the source code for splitter on BitBucket, along with the source code for ‘range’ the library that I wrote in order to make the splitter program easier to deal with. The repositories are here:

But words are just words and I really need to show you some examples.

Show me an Example!

For this demo lets make a file that has twenty lines in it and, on every line, are the numbers one to twenty, like this one: Twenty Numbers.

If you were to get that file (calling it ‘twenty.txt’) then the following commands would have the following results. You could get single lines from files:

$ cat twenty.txt | splitter 3
three
$

You could get an entire range of lines from a file:

$ cat twenty.txt | splitter 5-9
five
six
seven
eight
nine
$

You could get multiple ranges from the file:

$ cat twenty.txt | splitter 10-14,2-4
two
three
four
ten
eleven
twelve
thirteen
fourteen
$

You can get ranges that are only bounded on one side:

$ cat twenty.txt | splitter -5,15-
one
two
three
four
five
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
$

You can invert the selection if you chose to:

$ cat twenty.txt | splitter -i -5,15-
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
$

And you can specify an infinite range if you really want to (even though it would be the same as ‘cat’):

$ cat twenty.txt | splitter *
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
$

And the are a few more options that you can choose from that you can see by running ‘splitter –help’. I would recommend that you have a play around with it yourself. It will be possible to install it on any platform that has a cabal-install installed. Which will be part of the Haskell Platform.

Concluding Words

The bottom line is that splitter makes it really easy to only extract certain lines from your files. It also has the following features so that you can:

  • Select any range that you like; whether infinite or fixed.
  • Select infinite ranges.
  • Invert your selection so that you get all of the lines that you did NOT specify.
  • You can get the line numbers printed out with the lines in the file.
  • Lines are printed out when they are ready. Meaning that you can use splitter on a logfile in the same way that you can use ‘tail -f’.

I have tried to make it a highly useful and focussed tool to get certain lines from files using an easy to understand format to specify which lines that you want. For more detailed information you should check out the README file on BitBucket. It is perhaps the most comprehensive and up to date resource on the way to use the splitter tool.

Extra: Range code huh? That sounds useful.

While I was writing this I did indeed look around for Range libraries that would meet my criteria. I discovered the following:

  • ranges
    A nice looking package that has been marked as Obsolete by the Author. I did not want to have to be stuck on an obsolete version of code that would not be updated. Also, this library cannot handle infinite ranges.
  • Ranged-set
    T
    his is a nice library and it makes good use of Haskell classes but it does not support infinite ranges either and thus was not suitable for this project

So, getting excited and wanting to start from scratch, I wrote my own library called: range. That I have now placed on Hackage. Please feel free to use it for your own purposes and I will happily accept pull requests on that work.

Haskell

How to install HDBC-sqlite3 on windows.

You have a problem it seems. You want to install the Haskell HDBC-sqlite3 library on Windows but you seem to be getting these error messages telling you that it cannot find the required sqlite3 library. The way that you solve this kind of problem is by simply getting your hands on the sqlite3.dll and sqlite3.h files, from here, and putting them in locations that you can then add to the cabal install command later. The ultimate install command should look really simple, something like this:

cabal install HDBC-sqlite3 
--extra-lib-dirs='C:somepathtosqlitelib' 
--extra-include-dirs='C:somepathtosqliteinclude'

Where you obviously need to set what actual paths get given to those variables.

The only other gotcha that you will need to look out for it to make sure that the sqlite3.dll is always in the path of any program that you make that depends on HDBC-sqlite3 because otherwise it will complain about not being able to find the required dll. Just so that you know, the current directory of any program is always in the current library path.