ghc-9.12: The GHC API
Safe HaskellNone
LanguageGHC2021

GHC.Data.StringBuffer

Synopsis

Documentation

data StringBuffer Source #

A StringBuffer is an internal pointer to a sized chunk of bytes. The bytes are intended to be *immutable*. There are pure operations to read the contents of a StringBuffer.

A StringBuffer may have a finalizer, depending on how it was obtained.

Constructors

StringBuffer 

Fields

Instances

Instances details
Show StringBuffer Source # 
Instance details

Defined in GHC.Data.StringBuffer

Creation/destruction

hGetStringBuffer :: FilePath -> IO StringBuffer Source #

Read a file into a StringBuffer. The resulting buffer is automatically managed by the garbage collector.

stringToStringBuffer :: String -> StringBuffer Source #

Encode a String into a StringBuffer as UTF-8. The resulting buffer is automatically managed by the garbage collector.

stringBufferFromByteString :: ByteString -> StringBuffer Source #

Convert a UTF-8 encoded ByteString into a 'StringBuffer. This really relies on the internals of both ByteString and StringBuffer.

O(n) (but optimized into a memcpy by bytestring under the hood)

Inspection

nextChar :: StringBuffer -> (Char, StringBuffer) Source #

Return the first UTF-8 character of a nonempty StringBuffer and as well the remaining portion (analogous to uncons). Warning: The behavior is undefined if the StringBuffer is empty. The result shares the same buffer as the original. Similar to utf8DecodeChar, if the character cannot be decoded as UTF-8, '\0' is returned.

currentChar :: StringBuffer -> Char Source #

Return the first UTF-8 character of a nonempty StringBuffer (analogous to head). Warning: The behavior is undefined if the StringBuffer is empty. Similar to utf8DecodeChar, if the character cannot be decoded as UTF-8, '\0' is returned.

atEnd :: StringBuffer -> Bool Source #

Check whether a StringBuffer is empty (analogous to null).

fingerprintStringBuffer :: StringBuffer -> Fingerprint Source #

Computes a hash of the contents of a StringBuffer.

Moving and comparison

stepOn :: StringBuffer -> StringBuffer Source #

Return a StringBuffer with the first UTF-8 character removed (analogous to tail). Warning: The behavior is undefined if the StringBuffer is empty. The result shares the same buffer as the original.

offsetBytes Source #

Arguments

:: Int

n, the number of bytes

-> StringBuffer 
-> StringBuffer 

Return a StringBuffer with the first n bytes removed. Warning: If there aren't enough characters, the returned StringBuffer will be invalid and any use of it may lead to undefined behavior. The result shares the same buffer as the original.

byteDiff :: StringBuffer -> StringBuffer -> Int Source #

Compute the difference in offset between two StringBuffers that share the same buffer. Warning: The behavior is undefined if the StringBuffers use separate buffers.

atLine :: Int -> StringBuffer -> Maybe StringBuffer Source #

Computes a StringBuffer which points to the first character of the wanted line. Lines begin at 1.

Conversion

lexemeToString Source #

Arguments

:: StringBuffer 
-> Int

n, the number of bytes

-> String 

Decode the first n bytes of a StringBuffer as UTF-8 into a String. Similar to utf8DecodeChar, if the character cannot be decoded as UTF-8, they will be replaced with '\0'.

lexemeToFastString Source #

Arguments

:: StringBuffer 
-> Int

n, the number of bytes

-> FastString 

decodePrevNChars :: Int -> StringBuffer -> String Source #

Return the previous n characters (or fewer if we are less than n characters into the buffer.

Parsing integers

findHashOffset :: StringBuffer -> Int Source #

Find the offset of the # character in the StringBuffer.

Make sure that it contains one before calling this function!

Checking for bi-directional format characters

containsBidirectionalFormatChar :: StringBuffer -> Bool Source #

Returns true if the buffer contains Unicode bi-directional formatting characters.

https://www.unicode.org/reports/tr9/#Bidirectional_Character_Types

Bidirectional format characters are one of 'x202a' : "U+202A LEFT-TO-RIGHT EMBEDDING (LRE)" 'x202b' : "U+202B RIGHT-TO-LEFT EMBEDDING (RLE)" 'x202c' : "U+202C POP DIRECTIONAL FORMATTING (PDF)" 'x202d' : "U+202D LEFT-TO-RIGHT OVERRIDE (LRO)" 'x202e' : "U+202E RIGHT-TO-LEFT OVERRIDE (RLO)" 'x2066' : "U+2066 LEFT-TO-RIGHT ISOLATE (LRI)" 'x2067' : "U+2067 RIGHT-TO-LEFT ISOLATE (RLI)" 'x2068' : "U+2068 FIRST STRONG ISOLATE (FSI)" 'x2069' : "U+2069 POP DIRECTIONAL ISOLATE (PDI)"

This list is encoded in bidirectionalFormatChars