(Sigbjorn Finne supplied the regular-expressions interface.)
The `Regex' library provides quite direct interface to the GNU regular-expression library, for doing manipulation on `_PackedString's. You probably need to see the GNU documentation if you are operating at this level.
The datatypes and functions that `Regex' provides are:
data PatBuffer # just a bunch of bytes (mutable) data REmatch = REmatch (Array Int GroupBounds) -- for $1, ... $n GroupBounds -- for $` (everything before match) GroupBounds -- for $& (entire matched string) GroupBounds -- for $' (everything after) GroupBounds -- for $+ (matched by last bracket) -- GroupBounds hold the interval where a group -- matched inside a string, e.g. -- -- matching "reg(exp)" "a regexp" returns the pair (5,7) for the -- (exp) group. (_PackedString indices start from 0) type GroupBounds = (Int, Int) re_compile_pattern :: _PackedString -- pattern to compile -> Bool -- True <=> assume single-line mode -> Bool -- True <=> case-insensitive -> PrimIO PatBuffer re_match :: PatBuffer -- compiled regexp -> _PackedString -- string to match -> Int -- start position -> Bool -- True <=> record results in registers -> PrimIO (Maybe REmatch) -- Matching on 2 strings is useful when you're dealing with multiple -- buffers, which is something that could prove useful for -- PackedStrings, as we don't want to stuff the contents of a file -- into one massive heap chunk, but load (smaller chunks) on demand. re_match2 :: PatBuffer -- 2-string version -> _PackedString -> _PackedString -> Int -> Int -> Bool -> PrimIO (Maybe REmatch) re_search :: PatBuffer -- compiled regexp -> _PackedString -- string to search -> Int -- start index -> Int -- stop index -> Bool -- True <=> record results in registers -> PrimIO (Maybe REmatch) re_search2 :: PatBuffer -- Double buffer search -> _PackedString -> _PackedString -> Int -- start index -> Int -- range (?) -> Int -- stop index -> Bool -- True <=> results in registers -> PrimIO (Maybe REmatch)
The `MatchPS' module provides Perl-like "higher-level" facilities to operate on `_PackedStrings'. The regular expressions in question are in Perl syntax. The "flags" on various functions can include: `i' for case-insensitive, `s' for single-line mode, and `g' for global. (It's probably worth your time to peruse the source code...)
matchPS :: _PackedString -- regexp -> _PackedString -- string to match -> [Char] -- flags -> Maybe REmatch -- info about what matched and where searchPS :: _PackedString -- regexp -> _PackedString -- string to match -> [Char] -- flags -> Maybe REmatch -- Perl-like match-and-substitute: substPS :: _PackedString -- regexp -> _PackedString -- replacement -> [Char] -- flags -> _PackedString -- string -> _PackedString -- same as substPS, but no prefix and suffix: replacePS :: _PackedString -- regexp -> _PackedString -- replacement -> [Char] -- flags -> _PackedString -- string -> _PackedString match2PS :: _PackedString -- regexp -> _PackedString -- string1 to match -> _PackedString -- string2 to match -> [Char] -- flags -> Maybe REmatch search2PS :: _PackedString -- regexp -> _PackedString -- string to match -> _PackedString -- string to match -> [Char] -- flags -> Maybe REmatch -- functions to pull the matched pieces out of an REmatch: getMatchesNo :: REmatch -> Int getMatchedGroup :: REmatch -> Int -> _PackedString -> _PackedString getWholeMatch :: REmatch -> _PackedString -> _PackedString getLastMatch :: REmatch -> _PackedString -> _PackedString getAfterMatch :: REmatch -> _PackedString -> _PackedString -- (reverse) brute-force string matching; -- Perl equivalent is index/rindex: findPS, rfindPS :: _PackedString -> _PackedString -> Maybe Int -- Equivalent to Perl "chop" (off the last character, if any): chopPS :: _PackedString -> _PackedString -- matchPrefixPS: tries to match as much as possible of strA starting -- from the beginning of strB (handy when matching fancy literals in -- parsers): matchPrefixPS :: _PackedString -> _PackedString -> Int