Copyright	(c) 2009 2010 2011 Bryan O'Sullivan (c) 2009 Duncan Coutts (c) 2008 2009 Tom Harper (c) 2021 Andrew Lelechenko
License	BSD-style
Maintainer	bos@serpentine.com
Portability	portable
Safe Haskell	Trustworthy
Language	Haskell2010

Data.Text.Encoding

Contents

Decoding ByteStrings to Text
- Total Functions
- Partial Functions
  - Stream oriented decoding
Encoding Text to ByteStrings
Encoding Text using ByteString Builders

Description

Functions for converting Text values to and from ByteString, using several standard encodings.

To gain access to a much larger family of encodings, use the text-icu package.

Synopsis

decodeLatin1 :: ByteString -> Text
decodeUtf8Lenient :: ByteString -> Text
decodeUtf8' :: ByteString -> Either UnicodeException Text
decodeUtf8With :: OnDecodeError -> ByteString -> Text
decodeUtf16LEWith :: OnDecodeError -> ByteString -> Text
decodeUtf16BEWith :: OnDecodeError -> ByteString -> Text
decodeUtf32LEWith :: OnDecodeError -> ByteString -> Text
decodeUtf32BEWith :: OnDecodeError -> ByteString -> Text
streamDecodeUtf8With :: OnDecodeError -> ByteString -> Decoding
data Decoding = Some !Text !ByteString (ByteString -> Decoding)
decodeASCII :: ByteString -> Text
decodeUtf8 :: ByteString -> Text
decodeUtf16LE :: ByteString -> Text
decodeUtf16BE :: ByteString -> Text
decodeUtf32LE :: ByteString -> Text
decodeUtf32BE :: ByteString -> Text
streamDecodeUtf8 :: ByteString -> Decoding
encodeUtf8 :: Text -> ByteString
encodeUtf16LE :: Text -> ByteString
encodeUtf16BE :: Text -> ByteString
encodeUtf32LE :: Text -> ByteString
encodeUtf32BE :: Text -> ByteString
encodeUtf8Builder :: Text -> Builder
encodeUtf8BuilderEscaped :: BoundedPrim Word8 -> Text -> Builder

Decoding ByteStrings to Text

All of the single-parameter functions for decoding bytestrings encoded in one of the Unicode Transformation Formats (UTF) operate in a strict mode: each will throw an exception if given invalid input.

Each function has a variant, whose name is suffixed with -With, that gives greater control over the handling of decoding errors. For instance, decodeUtf8 will throw an exception, but decodeUtf8With allows the programmer to determine what to do on a decoding error.

Total Functions

These functions facilitate total decoding and should be preferred over their partial counterparts.

decodeLatin1 :: ByteString -> Text Source #

Decode a ByteString containing Latin-1 (aka ISO-8859-1) encoded text.

decodeLatin1 is semantically equivalent to Data.Text.pack . Data.ByteString.Char8.unpack

This is a total function. However, bear in mind that decoding Latin-1 (non-ASCII) characters to UTf-8 requires actual work and is not just buffer copying.

decodeUtf8Lenient :: ByteString -> Text Source #

Decode a ByteString containing UTF-8 encoded text.

Any invalid input bytes will be replaced with the Unicode replacement character U+FFFD.

Catchable failure

decodeUtf8' :: ByteString -> Either UnicodeException Text Source #

Decode a ByteString containing UTF-8 encoded text.

If the input contains any invalid UTF-8 data, the relevant exception will be returned, otherwise the decoded text.

Controllable error handling

decodeUtf8With :: OnDecodeError -> ByteString -> Text Source #

Decode a ByteString containing UTF-8 encoded text.

Surrogate code points in replacement character returned by OnDecodeError will be automatically remapped to the replacement char U+FFFD.

decodeUtf16LEWith :: OnDecodeError -> ByteString -> Text Source #

Decode text from little endian UTF-16 encoding.

decodeUtf16BEWith :: OnDecodeError -> ByteString -> Text Source #

Decode text from big endian UTF-16 encoding.

decodeUtf32LEWith :: OnDecodeError -> ByteString -> Text Source #

Decode text from little endian UTF-32 encoding.

decodeUtf32BEWith :: OnDecodeError -> ByteString -> Text Source #

Decode text from big endian UTF-32 encoding.

Stream oriented decoding

The streamDecodeUtf8 and streamDecodeUtf8With functions accept a ByteString that represents a possibly incomplete input (e.g. a packet from a network stream) that may not end on a UTF-8 boundary.

The maximal prefix of Text that could be decoded from the given input.
The suffix of the ByteString that could not be decoded due to insufficient input.
A function that accepts another ByteString. That string will be assumed to directly follow the string that was passed as input to the original function, and it will in turn be decoded.

To help understand the use of these functions, consider the Unicode string "hi ☃". If encoded as UTF-8, this becomes "hi \xe2\x98\x83"; the final '☃' is encoded as 3 bytes.

Now suppose that we receive this encoded string as 3 packets that are split up on untidy boundaries: ["hi \xe2", "\x98", "\x83"]. We cannot decode the entire Unicode string until we have received all three packets, but we would like to make progress as we receive each one.

ghci> let s0@(Some _ _ f0) = streamDecodeUtf8 "hi \xe2"
ghci> s0
Some "hi " "\xe2" _

We use the continuation f0 to decode our second packet.

ghci> let s1@(Some _ _ f1) = f0 "\x98"
ghci> s1
Some "" "\xe2\x98"

We could not give f0 enough input to decode anything, so it returned an empty string. Once we feed our second continuation f1 the last byte of input, it will make progress.

ghci> let s2@(Some _ _ f2) = f1 "\x83"
ghci> s2
Some "\x2603" "" _

If given invalid input, an exception will be thrown by the function or continuation where it is encountered.

streamDecodeUtf8With :: OnDecodeError -> ByteString -> Decoding Source #

Decode, in a stream oriented way, a lazy ByteString containing UTF-8 encoded text.

Since: text-1.0.0.0

data Decoding Source #

A stream oriented decoding result.

Since: text-1.0.0.0

Constructors

Some !Text !ByteString (ByteString -> Decoding)

Instances

Instances details

Show Decoding Source #
Instance details Defined in Data.Text.Encoding Methods showsPrec :: Int -> Decoding -> ShowS Source # show :: Decoding -> String Source # showList :: [Decoding] -> ShowS Source #

Partial Functions

These functions are partial and should only be used with great caution (preferably not at all). See Data.Text.Encoding for better solutions.

decodeASCII :: ByteString -> Text Source #

Decode a ByteString containing 7-bit ASCII encoded text.

This is a partial function: it checks that input does not contain anything except ASCII and copies buffer or throws an error otherwise.

decodeUtf8 :: ByteString -> Text Source #

Decode a ByteString containing UTF-8 encoded text that is known to be valid.

If the input contains any invalid UTF-8 data, an exception will be thrown that cannot be caught in pure code. For more control over the handling of invalid data, use decodeUtf8' or decodeUtf8With.

This is a partial function: it checks that input is a well-formed UTF-8 sequence and copies buffer or throws an error otherwise.

decodeUtf16LE :: ByteString -> Text Source #

Decode text from little endian UTF-16 encoding.

If the input contains any invalid little endian UTF-16 data, an exception will be thrown. For more control over the handling of invalid data, use decodeUtf16LEWith.

decodeUtf16BE :: ByteString -> Text Source #

Decode text from big endian UTF-16 encoding.

If the input contains any invalid big endian UTF-16 data, an exception will be thrown. For more control over the handling of invalid data, use decodeUtf16BEWith.

decodeUtf32LE :: ByteString -> Text Source #

Decode text from little endian UTF-32 encoding.

If the input contains any invalid little endian UTF-32 data, an exception will be thrown. For more control over the handling of invalid data, use decodeUtf32LEWith.

decodeUtf32BE :: ByteString -> Text Source #

Decode text from big endian UTF-32 encoding.

If the input contains any invalid big endian UTF-32 data, an exception will be thrown. For more control over the handling of invalid data, use decodeUtf32BEWith.

Stream oriented decoding

streamDecodeUtf8 :: ByteString -> Decoding Source #

Decode, in a stream oriented way, a ByteString containing UTF-8 encoded text that is known to be valid.

If the input contains any invalid UTF-8 data, an exception will be thrown (either by this function or a continuation) that cannot be caught in pure code. For more control over the handling of invalid data, use streamDecodeUtf8With.

Since: text-1.0.0.0