binary-0.7.1.0: Binary serialisation for Haskell values using lazy ByteStrings

CopyrightLennart Kolmodin
LicenseBSD3-style (see LICENSE)
MaintainerLennart Kolmodin <kolmodin@gmail.com>
Stabilityexperimental
Portabilityportable to Hugs and GHC.
Safe HaskellTrustworthy
LanguageHaskell98

Data.Binary.Get

Contents

Description

The Get monad. A monad for efficiently building structures from encoded lazy ByteStrings.

Primitives are available to decode words of various sizes, both big and little endian.

Let's decode binary data representing illustrated here. In this example the values are in little endian.

+------------------+--------------+-----------------+
| 32 bit timestamp | 32 bit price | 16 bit quantity |
+------------------+--------------+-----------------+

A corresponding Haskell value looks like this:

 data Trade = Trade
   { timestamp :: !Word32
   , price     :: !Word32
   , qty       :: !Word16
   } deriving (Show)
 

The fields in Trade are marked as strict (using !) since we don't need laziness here. In practise, you would probably consider using the UNPACK pragma as well. http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#unpack-pragma

Now, let's have a look at a decoder for this format.

 getTrade :: Get Trade
 getTrade = do
   timestamp <- getWord32le
   price     <- getWord32le
   quantity  <- getWord16le
   return $! Trade timestamp price quantity
 

Or even simpler using applicative style:

 getTrade' :: Get Trade
 getTrade' = Trade <$> getWord32le <*> getWord32le <*> getWord16le
 

The applicative style can sometimes result in faster code, as binary will try to optimize the code by grouping the reads together.

There are two kinds of ways to execute this decoder, the lazy input method and the incremental input method. Here we will use the lazy input method.

Let's first define a function that decodes many Trades.

 getTrades :: Get [Trade]
 getTrades = do
   empty <- isEmpty
   if empty
     then return []
     else do trade <- getTrade
             trades <- getTrades
             return (trade:trades)
 

Finally, we run the decoder:

 example :: IO ()
 example = do
  input <- BL.readFile "trades.bin"
  let trades = runGet getTrades input 
  print trades
 

This decoder has the downside that it will need to read all the input before it can return. On the other hand, it will not return anything until it knows it could decode without any decoder errors.

You could also refactor to a left-fold, to decode in a more streaming fashion, and get the following decoder. It will start to return data without knowning that it can decode all input.

 example2 :: BL.ByteString -> [Trade]
 example2 input
   | BL.null input = []
   | otherwise =
      let (trade, rest, _) = runGetState getTrade input 0
      in trade : example2 rest
 

Both these examples use lazy I/O to read the file from the disk, which is not suitable in all applications, and certainly not if you need to read from a socket which has higher likelihood to fail. To address these needs, use the incremental input method. For an example of this, see the implementation of decodeFileOrFail in Data.Binary.

Synopsis

The Get monad

The lazy input interface

The lazy interface consumes a single lazy ByteString. It's the easiest interface to get started with, but it doesn't support interleaving I/O and parsing, unless lazy I/O is used.

There is no way to provide more input other than the initial data. To be able to incrementally give more data, see the incremental input interface.

runGet :: Get a -> ByteString -> a Source

The simplest interface to run a Get decoder. If the decoder runs into an error, calls fail, or runs out of input, it will call error.

runGetOrFail :: Get a -> ByteString -> Either (ByteString, ByteOffset, String) (ByteString, ByteOffset, a) Source

Run a Get monad and return Left on failure and Right on success. In both cases any unconsumed input and the number of bytes consumed is returned. In the case of failure, a human-readable error message is included as well.

type ByteOffset = Int64 Source

An offset, counted in bytes.

The incremental input interface

The incremental interface gives you more control over how input is provided during parsing. This lets you e.g. interleave parsing and I/O.

The incremental interface consumes a strict ByteString at a time, each being part of the total amount of input. If your decoder needs more input to finish it will return a Partial with a continuation. If there is no more input, provide it Nothing.

Fail will be returned if it runs into an error, together with a message, the position and the remaining input. If it succeeds it will return Done with the resulting value, the position and the remaining input.

data Decoder a Source

A decoder procuced by running a Get monad.

Constructors

Fail !ByteString !ByteOffset String

The decoder ran into an error. The decoder either used fail or was not provided enough input. Contains any unconsumed input and the number of bytes consumed.

Partial (Maybe ByteString -> Decoder a)

The decoder has consumed the available input and needs more to continue. Provide Just if more input is available and Nothing otherwise, and you will get a new Decoder.

Done !ByteString !ByteOffset a

The decoder has successfully finished. Except for the output value you also get any unused input as well as the number of bytes consumed.

runGetIncremental :: Get a -> Decoder a Source

Run a Get monad. See Decoder for what to do next, like providing input, handling decoder errors and to get the output value. Hint: Use the helper functions pushChunk, pushChunks and pushEndOfInput.

Providing input

pushChunk :: Decoder a -> ByteString -> Decoder a Source

Feed a Decoder with more input. If the Decoder is Done or Fail it will add the input to ByteString of unconsumed input.

    runGetIncremental myParser `pushChunk` myInput1 `pushChunk` myInput2
 

pushChunks :: Decoder a -> ByteString -> Decoder a Source

Feed a Decoder with more input. If the Decoder is Done or Fail it will add the input to ByteString of unconsumed input.

    runGetIncremental myParser `pushChunks` myLazyByteString
 

pushEndOfInput :: Decoder a -> Decoder a Source

Tell a Decoder that there is no more input. This passes Nothing to a Partial decoder, otherwise returns the decoder unchanged.

Decoding

skip :: Int -> Get () Source

Skip ahead n bytes. Fails if fewer than n bytes are available.

isEmpty :: Get Bool Source

Test whether all input has been consumed, i.e. there are no remaining undecoded bytes.

bytesRead :: Get Int64 Source

Get the total number of bytes read to this point.

lookAhead :: Get a -> Get a Source

Run the given decoder, but without consuming its input. If the given decoder fails, then so will this function.

lookAheadM :: Get (Maybe a) -> Get (Maybe a) Source

Run the given decoder, and only consume its input if it returns Just. If Nothing is returned, the input will be unconsumed. If the given decoder fails, then so will this function.

lookAheadE :: Get (Either a b) -> Get (Either a b) Source

Run the given decoder, and only consume its input if it returns Right. If Left is returned, the input will be unconsumed. If the given decoder fails, then so will this function.

ByteStrings

getByteString :: Int -> Get ByteString Source

An efficient get method for strict ByteStrings. Fails if fewer than n bytes are left in the input. If n <= 0 then the empty string is returned.

getLazyByteString :: Int64 -> Get ByteString Source

An efficient get method for lazy ByteStrings. Fails if fewer than n bytes are left in the input.

getLazyByteStringNul :: Get ByteString Source

Get a lazy ByteString that is terminated with a NUL byte. The returned string does not contain the NUL byte. Fails if it reaches the end of input without finding a NUL.

getRemainingLazyByteString :: Get ByteString Source

Get the remaining bytes as a lazy ByteString. Note that this can be an expensive function to use as it forces reading all input and keeping the string in-memory.

Decoding words

getWord8 :: Get Word8 Source

Read a Word8 from the monad state

Big-endian decoding

getWord16be :: Get Word16 Source

Read a Word16 in big endian format

getWord32be :: Get Word32 Source

Read a Word32 in big endian format

getWord64be :: Get Word64 Source

Read a Word64 in big endian format

Little-endian decoding

getWord16le :: Get Word16 Source

Read a Word16 in little endian format

getWord32le :: Get Word32 Source

Read a Word32 in little endian format

getWord64le :: Get Word64 Source

Read a Word64 in little endian format

Host-endian, unaligned decoding

getWordhost :: Get Word Source

O(1). Read a single native machine word. The word is read in host order, host endian form, for the machine you're on. On a 64 bit machine the Word is an 8 byte value, on a 32 bit machine, 4 bytes.

getWord16host :: Get Word16 Source

O(1). Read a 2 byte Word16 in native host order and host endianness.

getWord32host :: Get Word32 Source

O(1). Read a Word32 in native host order and host endianness.

getWord64host :: Get Word64 Source

O(1). Read a Word64 in native host order and host endianess.

Deprecated functions

runGetState :: Get a -> ByteString -> ByteOffset -> (a, ByteString, ByteOffset) Source

Deprecated: Use runGetIncremental instead. This function will be removed.

DEPRECATED. Provides compatibility with previous versions of this library. Run a Get monad and return a tuple with thee values. The first value is the result of the decoder. The second and third are the unused input, and the number of consumed bytes.

remaining :: Get Int64 Source

Deprecated: This will force all remaining input, don't use it.

DEPRECATED. Get the number of bytes of remaining input. Note that this is an expensive function to use as in order to calculate how much input remains, all input has to be read and kept in-memory. The decoder keeps the input as a strict bytestring, so you are likely better off by calculating the remaining input in another way.

getBytes :: Int -> Get ByteString Source

Deprecated: Use getByteString instead of getBytes.

DEPRECATED. Same as getByteString.