Portability | non-portable |
---|---|
Stability | internal |
Maintainer | libraries@haskell.org |
Text codecs for I/O
- data BufferCodec from to state = BufferCodec {}
- data TextEncoding = forall dstate estate . TextEncoding {
- textEncodingName :: String
- mkTextDecoder :: IO (TextDecoder dstate)
- mkTextEncoder :: IO (TextEncoder estate)
- type TextEncoder state = BufferCodec CharBufElem Word8 state
- type TextDecoder state = BufferCodec Word8 CharBufElem state
- latin1 :: TextEncoding
- latin1_encode :: CharBuffer -> Buffer Word8 -> IO (CharBuffer, Buffer Word8)
- latin1_decode :: Buffer Word8 -> CharBuffer -> IO (Buffer Word8, CharBuffer)
- utf8 :: TextEncoding
- utf8_bom :: TextEncoding
- utf16 :: TextEncoding
- utf16le :: TextEncoding
- utf16be :: TextEncoding
- utf32 :: TextEncoding
- utf32le :: TextEncoding
- utf32be :: TextEncoding
- localeEncoding :: TextEncoding
- mkTextEncoding :: String -> IO TextEncoding
Documentation
data BufferCodec from to state Source
BufferCodec | |
|
data TextEncoding Source
A TextEncoding
is a specification of a conversion scheme
between sequences of bytes and sequences of Unicode characters.
For example, UTF-8 is an encoding of Unicode characters into a sequence
of bytes. The TextEncoding
for UTF-8 is utf8
.
forall dstate estate . TextEncoding | |
|
type TextEncoder state = BufferCodec CharBufElem Word8 stateSource
type TextDecoder state = BufferCodec Word8 CharBufElem stateSource
The Latin1 (ISO8859-1) encoding. This encoding maps bytes
directly to the first 256 Unicode code points, and is thus not a
complete Unicode encoding. An attempt to write a character greater than
'\255' to a Handle
using the latin1
encoding will result in an error.
latin1_encode :: CharBuffer -> Buffer Word8 -> IO (CharBuffer, Buffer Word8)Source
latin1_decode :: Buffer Word8 -> CharBuffer -> IO (Buffer Word8, CharBuffer)Source
The UTF-8 Unicode encoding
utf8_bom :: TextEncodingSource
The UTF-8 Unicode encoding, with a byte-order-mark (BOM; the byte
sequence 0xEF 0xBB 0xBF). This encoding behaves like utf8
,
except that on input, the BOM sequence is ignored at the beginning
of the stream, and on output, the BOM sequence is prepended.
The byte-order-mark is strictly unnecessary in UTF-8, but is sometimes used to identify the encoding of a file.
The UTF-16 Unicode encoding (a byte-order-mark should be used to indicate endianness).
The UTF-16 Unicode encoding (litte-endian)
The UTF-16 Unicode encoding (big-endian)
The UTF-32 Unicode encoding (a byte-order-mark should be used to indicate endianness).
The UTF-32 Unicode encoding (litte-endian)
The UTF-32 Unicode encoding (big-endian)
localeEncoding :: TextEncodingSource
The Unicode encoding of the current locale
mkTextEncoding :: String -> IO TextEncodingSource
Look up the named Unicode encoding. May fail with
-
isDoesNotExistError
if the encoding is unknown
The set of known encodings is system-dependent, but includes at least:
UTF-8
-
UTF-16
,UTF-16BE
,UTF-16LE
-
UTF-32
,UTF-32BE
,UTF-32LE
On systems using GNU iconv (e.g. Linux), there is additional notation for specifying how illegal characters are handled:
- a suffix of
//IGNORE
, e.g.UTF-8//IGNORE
, will cause all illegal sequences on input to be ignored, and on output will drop all code points that have no representation in the target encoding. - a suffix of
//TRANSLIT
will choose a replacement character for illegal sequences or code points.
On Windows, you can access supported code pages with the prefix
CP
; for example, "CP1250"
.