Thursday, February 17, 2011

Microbenchmark of Haskell MD5 implementations

We're just cleaning up our code including our long list of library dependencies. We have a dependency to libcrypto.so from openssl introduced by the usage of nano-md5. There are (at least) two other packages that provide md5 implementations: pureMD5 and cryptohash. The following microbenchmark compares them using criterion:

import qualified Data.ByteString as BS
import qualified Data.ByteString.Lazy as BSL

import qualified Data.Digest.OpenSSL.MD5 as NanoMD5
import qualified Data.Digest.Pure.MD5 as PureMD5
import qualified Crypto.Hash.MD5 as ChMD5

import Numeric (showHex)
import Criterion.Main (defaultMain, bench, nf)
import System.Environment (getArgs)

main =
do x <- BSL.readFile "/lib/libc.so.6"
BSL.length x `seq`
defaultMain [ bench "cryptohash" $ nf ch x
, bench "nano" $ nf nano x
, bench "pure" $ nf pure x
]

go :: BS.ByteString -> Int -> [String] -> String
go bs n acc
| n `seq` bs `seq` False = undefined
| n >= 16 = concat (reverse acc)
| otherwise = go bs (n+1) (draw (BS.index bs n) : acc)

draw w =
case showHex w [] of
[x] -> ['0', x]
x -> x

nano :: BSL.ByteString -> String
nano lazy =
let strict = BS.concat $ BSL.toChunks lazy
in NanoMD5.md5sum strict

pure :: BSL.ByteString -> String
pure = show . PureMD5.md5

ch :: BSL.ByteString -> String
ch bs = go (ChMD5.hashlazy bs) 0 []
On my Intel i7 860 the results are:

benchmarking cryptohash
mean: 2.886409 ms, lb 2.885261 ms, ub 2.887946 ms, ci 0.950
std dev: 6.718781 us, lb 5.388123 us, ub 8.687483 us, ci 0.950

benchmarking nano
mean: 2.704862 ms, lb 2.704301 ms, ub 2.706016 ms, ci 0.950
std dev: 3.970617 us, lb 2.260051 us, ub 7.420531 us, ci 0.950

benchmarking pure
mean: 10.27061 ms, lb 10.26878 ms, ub 10.27485 ms, ci 0.950
std dev: 13.54268 us, lb 7.179537 us, ub 27.38285 us, ci 0.950

Author: David Leuschner