Writing a redis clone in Haskell for some goddamn reason

I’ve played around a little bit with Haskell, but the closest I’ve got to doing something serious with it is writing a relational algebra interpreter (which I did use to do some homework for a databases class, so I guess that’s genuine practical success). I thought it would be interesting to try and write something practical, and for some reason I decided to write a redis clone. On the one hand, redis provides data structures and various operations on them, we seems right in Haskell’s wheelhouse. On the other, redis exposes these data structures over a network connection, which is precisely the sort of messy IO Haskell sweeps under a carpet (or monad). So it seems like an interesting option to learn how Haskell might fare in the real world.

A key-value store

Starting with the more obviously Haskelly part of redis, the key-value store. redis uses plain ASCII strings as keys:

type Key = ByteString

Redis stores various different data structures, so values need to be a sum type:

data Value = StringValue ByteString | IntValue Int

newtype Store = Store (Map Key Value)

That’s enough for an initial implementation that supports basic get/put and increment/decrement operations. In redis, get and put operations always work in terms of strings, so we need to be able to put strings, and retrieve strings (converted from ints, if necessary):

lookupString :: Store -> Key -> Maybe ByteString
lookupString (Store m) k = case lookup k m of
  Nothing -> Nothing
  Just (StringValue s) -> Just s
  Just (IntValue i) -> Just $ intToByteString i

insertString :: Store -> Key -> ByteString -> Store
insertString (Store m) k s = Store $ insert k (StringValue s) m

Increment and decrement, on the other hand, work on integers, converting a string to an integer if necessary. As not all strings represent integers, the increment and decrement operations are fallible; they need to be able to return an error if the string can’t be converted to an integer.

data Error = BadType
  deriving (Eq, Show)

type Result a = Either Error a

lookupInt :: Store -> Key -> Result (Maybe Int)
lookupInt (Store m) k = case lookup k m of
  Nothing -> Right Nothing
  Just (IntValue i) -> Right i
  Just (StringValue s) -> case byteStringToInt s of
    Nothing -> Left BadType
    Just i -> Right i

insertInt :: Store -> Key -> Int -> Store
insertInt (Store m) k i = Store $ insert k (IntValue i) m

incr :: Store -> Key -> (Store, Result Int)
incr store@(Store m) k = case lookupInt k m of
  Left e -> (store, Left e)
  Right value ->
    let res = (fromMaybe 0 value) + 1
      in (insertInt store k res, Right res)

decr :: Store -> Key -> (Store, Result Int)
decr store@(Store m) k = case lookupInt k m of
  Left e -> (store, Left e)
  Right value ->
    let res = (fromMaybe 0 value) - 1
      in (insertInt store k res, Right res)

Another feature of redis, however, calls for a rethinking of this model. Redis allows setting an expiration time for a key; if the expiration time has passed, looking up the key acts as if it has never been set. This makes lookup more complicated in two respects: lookup needs to have access to the current time, and lookup, which up to know has not modified the store, may now need to remove a key if the expiry time has passed. We can modify the Store easily enough to add the expiry time as some metadata:

data Entry = {
  value :: Value,
  expiry :: Maybe UTCTime
}

newtype Store = Map Key Entry

More complicated is modifying the operations to handle expiring entries. This could be implemented by always passing down the current time to every operation, and always returning a (potentially) modified store from each operation. To my mind, though, this is not an optimal solution, because the implementation of the operation doesn’t need to know about the current time (it just needs to pass it down to the lookup); and a (conceptually) non-modifying operation like get shouldn’t have to worry about whether or not the lookup will modify the store.

Luckily, Haskell has a way to handle these kinds of irrelevant details: monads. When implementing the get operation, we want to think in terms of function with the signature Key -> Value, but we need to add something else to the signature to handle the details which, from the point of view of the function itself, are irrelevant (the current time and the potential state modification). Rather than exposing these details directly in the function signature (UTCTime -> Key -> (Value, Store)), we can encapsulate them in a monad. Actually, we need two monads, Reader (which supplies a value), and State (which encapsulates state modification), which we can combine like:

type StoreM = ReaderT UTCTime (State Store)

Then we can write get in a way that is mostly indifferent to the monad:

get :: Key -> StoreM (Maybe ByteString)
get k = do
  maybeValue <- lookupValue k
  return $ fmap maybeValue (\case
    IntValue i -> intToByteString i
    StringValue s -> s
  )

lookupValue, on the other hand, has to be aware of the details within the monad in order to handle expiry:

lookupValue :: Key -> StoreM (Maybe Value)
lookupValue k = do
  (Store m) <- get -- access the value in the `State` monad
  now <- ask -- access the value in the `Reader` monad
  case lookup k m of
    Nothing -> Nothing
    Just entry -> case expiry entry of
      Nothing -> return $ Just (value entry)
      Just expiryTime 
        | now < expiryTime -> return $ Just (value e)
        | otherwise -> do
            -- update the value in the `State` monad 
            -- to remove the expired entry
            put $ Store (M.delete k m)  
            return Nothing

The other end of the process, where the operations on the store are run, also needs to be aware of the monadic details, passing in the current time and retrieving the updated store, which can be done with:

runStoreM :: Store -> UTCTime -> StoreM a -> (a, Store)
runStoreM store now m = runState (runReaderT m now) store

There are some additional complications for the updating functions, which have to deal with either clearing the expiry time (for the normal set operation), setting a new expiry time (for the set with expiration operation), or retaining the current expiry time (for incr and decr), but I’ll gloss over these.

A network protocol

Redis exposes this key-value store over the network; the first part of this, the network protocol, plays to another of Haskell’s strengths: it’s a parser. Even better, it’s two parsers, one which parses a stream of bytes into general data structures (strings, ints, arrays, etc), and one which parses the resulting arrays into redis commands. Redis uses a text-based protocol called RESP, and a good choice for parsing this kind of thing is the attoparsec library, which is intended to be an efficient parser for byte-oriented formats. The RESP protocol is fairly straightforward, consisting of a small number of types:

data Resp
  = SimpleString ByteString
  | Error ByteString
  | Integer Int
  | BulkString ByteString
  | Array [Resp]
  | NullString

To distinguish between types, each value starts with a single byte representing the type (a sigil), followed by the data for the value. A parser for this general format can be represented as:

valueParser :: (a -> Resp) -> Word8 -> Parser a -> Parser Resp
valueParser constructor sigil parser = constructor <$> (word8 sigil *> p)

That is, it matches a single byte, which must be the sigil value, ignores it with *>, matches the supplied parser, and then maps the supplied constructor over the result.

The SimpleString type is represented by a + followed by a CRLF terminated string:

crlf = string "\r\n"

crlfTerminated = takeTill (== _cr)

simpleStringParser = valueParser SimpleString _plus crlfTerminated <* crlf

The Integer type is similar, but starts with a $ and is followed by a crlf terminated string of digits.

magnitudeParser :: Parser Int
magnitudeParser = fromJust . word8ArrayToInt <$> many1 (satisfy isDigit)

numberParser :: Parser Int
numberParser = do
  sign <- optional (word8 _hyphen)
  magnitude <- magnitudeParser
  return $ if isJust sign then -magnitude else magnitude

integerParser :: Parser Resp
integerParser = valueParser Integer _colon numberParser <* crlf

A BulkString is signified by a $ and, unlike a SimpleString, includes the length of the string before the actual string value (this means it can be read more efficiently, all at once, rather than having to check for a delimiter character by character). This makes the parser slightly more complicated, but it can re-use the magnitudeParser for reading the length:

bulkStringParser :: Parser Resp
bulkStringParser =
  valueParser
    BulkString
    _dollar
    ( do
        len <- magnitudeParser <* crlf
        P.take len <* crlf
    )

The RESP protocol also has the concept of a NullString, used, like NULL, to indicate that there is not available value. This is represented using the same format as a BulkString, but with a length of -1; as such, it can be parsed with:

nullStringParser :: Parser Resp
nullStringParser =
  valueParser
    (const NullString)
    _dollar
    (word8 _hyphen >> word8 _1 >> crlf)

(This means there are two parsers which attempt to consume a string starting with $; however, the attoparsec library will backtrack if it encounters, or doesn’t encounter the minus sign in -1, so will choose the correct parser on that basis).

An Array works similarly, prefixed with a count of elements; but, whereas each element of a BulkString is a character, each element of an Array is another RESP value. Luckily, Haskell’s lazy nature makes recursive parsers easy to write:

arrayParser :: Parser Resp
arrayParser =
  valueParser
    Array
    _asterisk
    ( do
        len <- magnitudeParser <* crlf
        count len respParser
    )

respParser here is the parser for any RESP value, which we haven’t implemented yet, but can now do so, simply as a choice between the different parsers we have implemented:

respParser :: Parser Resp
respParser =
  simpleStringParser
    <|> errorParser
    <|> integerParser
    <|> bulkStringParser
    <|> nullStringParser
    <|> arrayParser

All the redis commands are represented as RESP arrays of strings, which, for ease of processing can be converted to a list of ByteStrings:

extract :: Resp -> Maybe [ByteString] 
extract (Array parts) = traverse (\case BulkString s -> Just s; _ -> Nothing) parts
extract _ = Nothing

(I’m once again reminded of how convenient it is to have traverse convert a list of things to a thing of a list, and sad it’s not in available in more languages.)

All redis commands start with the command name, and in most cases this is followed by a fixed number of arguments with the meaning determined by their position, as in GET key, SET key value, etc. These are easy to parse with pattern matching. The SET command also has some additional possible arguments, like NX to specify no overwrite, or EX n to set an expiry time, and these can appear anywhere in the argument list, so can’t be matched with a simple pattern. I have cleverly dealt with these parameters by not implementing them, which is not as much as a limitation as it might be, as redis has separate (although deprecated) commands which are easier to pass (SETNX and SETEX respectively). So my limited parser can be implemented as:

parse :: Resp -> Either CommandError Command
parse resp = case extract resp of
  Just ["SET", k, v] -> Right . Set $ SetArgs k v
  Just ["GET", k] -> Right . Get $ GetArgs k
  Just ["SETNX", k, v] -> Right . SetNx $ SetArgs k v
  Just ["SETEX", k, d, v] -> case byteStringToInt d of
    Nothing -> Left $ ArgParseError d "number"
    Just i ->
      let duration = secondsToNominalDiffTime $ fromIntegral i
      in Right . SetEx $ SetExArgs k duration v
  Just ["INCR", k] -> Right . Incr $ IncrArgs k
  Just ["DECR", k] -> Right . Decr $ DecrArgs k
  _ -> Left CommandParseError

A server

So, now we have a data store and a parser to read commands to execute against this store. All we need for a complete redis is to expose this protocol over the network. This seems like the part that would be trickiest for Haskell, as it’s inherently about I/O and side effects, rather than being pure. Even here, however, we can largely describe the problem in pure terms. A redis server reads a stream of bytes, parses them to produce a stream of commands, applies these commands to the data store to produce a stream of results, serializes these results to a stream of bytes, and writes them out. This kind of looks like composing functions and mapping them over a stream, except for the reading and writing and modifying the store in response to commands.1

Haskell has a number of libraries for expressing side-effecting programs by composing operations on streams; I chose to use Conduit, in part because it has two convenient facilites: one for setting up a TCP server and then treating each connection as a series of stream operations, and another for applying attoparsec parsers as stream operations. Using this functionality, the server looks like:

runTCPServer
    (serverSettings 6379 "*")
    ( \app ->
        runConduit $
          appSource app
            .| respParse
            .| C.mapM (processCommand appState)
            .| respWrite
            .| appSink app
    )

runTCPServer does the main work, setting up a server that listens for connections. When a connection is made, it calls the supplied function with a parameter containing two streams, one for input and another for output. All my program needs to do is join these streams together with a pipeline that parses input (respParse), executes commands (mapping the processCommand function over the stream), serialising the result (respWrite).

Conclusion

While this only implements a tiny fraction of redis functionality, it is enough to run the redis-benchmark tool for those commands it supports:

Command Requests/second
GET 30432
SET 29481
INCR 16770

Running against a real redis server on the same machine gives the following benchmarks:

Command Requests/second
GET 104712
SET 103626
INCR 104493

So my tiny haskell redis clone runs at about a third of the speed of real redis (for GET and SET; about a 6th of the speed for INCR, perhaps because INCR has to both get and set). While I would be hard pressed to call this “good” performance, I’m still kind of pleased that an unoptimised haskell implementation isn’t worse. If you’re equally easily impressed, the full source code is on GitHub

  1. The real redis also includes commands to subscribe to a channel, receiving notifications when messages are published to that channel. This doesn’t fit the request/response pattern, so would need to be handled in a more complicated way, possibly similar to how this async chat server works.