base-4.5.0.0: Basic libraries

Portabilityportable
Stabilitystable
Maintainerlibraries@haskell.org
Safe HaskellTrustworthy

Data.Char

Contents

Description

The Char type and associated operations.

Synopsis

Documentation

data Char Source

The character type Char is an enumeration whose values represent Unicode (or equivalently ISO/IEC 10646) characters (see http://www.unicode.org/ for details). This set extends the ISO 8859-1 (Latin-1) character set (the first 256 characters), which is itself an extension of the ASCII character set (the first 128 characters). A character literal in Haskell has type Char.

To convert a Char to or from the corresponding Int value defined by Unicode, use toEnum and fromEnum from the Enum class respectively (or equivalently ord and chr).

Character classification

Unicode characters are divided into letters, numbers, marks, punctuation, symbols, separators (including spaces) and others (including control characters).

isControl :: Char -> BoolSource

Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

isSpace :: Char -> BoolSource

Returns True for any Unicode space character, and the control characters \t, \n, \r, \f, \v.

isLower :: Char -> BoolSource

Selects lower-case alphabetic Unicode characters (letters).

isUpper :: Char -> BoolSource

Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.

isAlpha :: Char -> BoolSource

Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to isLetter.

isAlphaNum :: Char -> BoolSource

Selects alphabetic or numeric digit Unicode characters.

Note that numeric digits outside the ASCII range are selected by this function but not by isDigit. Such digits may be part of identifiers but are not used by the printer and reader to represent numbers.

isPrint :: Char -> BoolSource

Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

isDigit :: Char -> BoolSource

Selects ASCII digits, i.e. '0'..'9'.

isOctDigit :: Char -> BoolSource

Selects ASCII octal digits, i.e. '0'..'7'.

isHexDigit :: Char -> BoolSource

Selects ASCII hexadecimal digits, i.e. '0'..'9', 'a'..'f', 'A'..'F'.

isLetter :: Char -> BoolSource

Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to isAlpha.

isMark :: Char -> BoolSource

Selects Unicode mark characters, e.g. accents and the like, which combine with preceding letters.

isNumber :: Char -> BoolSource

Selects Unicode numeric characters, including digits from various scripts, Roman numerals, etc.

isPunctuation :: Char -> BoolSource

Selects Unicode punctuation characters, including various kinds of connectors, brackets and quotes.

isSymbol :: Char -> BoolSource

Selects Unicode symbol characters, including mathematical and currency symbols.

isSeparator :: Char -> BoolSource

Selects Unicode space and separator characters.

Subranges

isAscii :: Char -> BoolSource

Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> BoolSource

Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

isAsciiUpper :: Char -> BoolSource

Selects ASCII upper-case letters, i.e. characters satisfying both isAscii and isUpper.

isAsciiLower :: Char -> BoolSource

Selects ASCII lower-case letters, i.e. characters satisfying both isAscii and isLower.

Unicode general categories

data GeneralCategory Source

Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard.

Constructors

UppercaseLetter

Lu: Letter, Uppercase

LowercaseLetter

Ll: Letter, Lowercase

TitlecaseLetter

Lt: Letter, Titlecase

ModifierLetter

Lm: Letter, Modifier

OtherLetter

Lo: Letter, Other

NonSpacingMark

Mn: Mark, Non-Spacing

SpacingCombiningMark

Mc: Mark, Spacing Combining

EnclosingMark

Me: Mark, Enclosing

DecimalNumber

Nd: Number, Decimal

LetterNumber

Nl: Number, Letter

OtherNumber

No: Number, Other

ConnectorPunctuation

Pc: Punctuation, Connector

DashPunctuation

Pd: Punctuation, Dash

OpenPunctuation

Ps: Punctuation, Open

ClosePunctuation

Pe: Punctuation, Close

InitialQuote

Pi: Punctuation, Initial quote

FinalQuote

Pf: Punctuation, Final quote

OtherPunctuation

Po: Punctuation, Other

MathSymbol

Sm: Symbol, Math

CurrencySymbol

Sc: Symbol, Currency

ModifierSymbol

Sk: Symbol, Modifier

OtherSymbol

So: Symbol, Other

Space

Zs: Separator, Space

LineSeparator

Zl: Separator, Line

ParagraphSeparator

Zp: Separator, Paragraph

Control

Cc: Other, Control

Format

Cf: Other, Format

Surrogate

Cs: Other, Surrogate

PrivateUse

Co: Other, Private Use

NotAssigned

Cn: Other, Not Assigned

generalCategory :: Char -> GeneralCategorySource

The Unicode general category of the character.

Case conversion

toUpper :: Char -> CharSource

Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.

toLower :: Char -> CharSource

Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.

toTitle :: Char -> CharSource

Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.

Single digit characters

digitToInt :: Char -> IntSource

Convert a single digit Char to the corresponding Int. This function fails unless its argument satisfies isHexDigit, but recognises both upper and lower-case hexadecimal digits (i.e. '0'..'9', 'a'..'f', 'A'..'F').

intToDigit :: Int -> CharSource

Convert an Int in the range 0..15 to the corresponding single digit Char. This function fails on other inputs, and generates lower-case hexadecimal digits.

Numeric representations

ord :: Char -> IntSource

The fromEnum method restricted to the type Char.

chr :: Int -> CharSource

The toEnum method restricted to the type Char.

String representations

showLitChar :: Char -> ShowSSource

Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example:

 showLitChar '\n' s  =  "\\n" ++ s

lexLitChar :: ReadS StringSource

Read a string representation of a character, using Haskell source-language escape conventions. For example:

 lexLitChar  "\\nHello"  =  [("\\n", "Hello")]

readLitChar :: ReadS CharSource

Read a string representation of a character, using Haskell source-language escape conventions, and convert it to the character that it encodes. For example:

 readLitChar "\\nHello"  =  [('\n', "Hello")]