(Sigbjorn Finne supplied the regular-expressions interface.)
The `Regex' library provides quite direct interface to the GNU regular-expression library, for doing manipulation on `_PackedString's. You probably need to see the GNU documentation if you are operating at this level.
The datatypes and functions that `Regex' provides are:
data PatBuffer # just a bunch of bytes (mutable)
data REmatch
= REmatch (Array Int GroupBounds) -- for $1, ... $n
GroupBounds -- for $` (everything before match)
GroupBounds -- for $& (entire matched string)
GroupBounds -- for $' (everything after)
GroupBounds -- for $+ (matched by last bracket)
-- GroupBounds hold the interval where a group
-- matched inside a string, e.g.
--
-- matching "reg(exp)" "a regexp" returns the pair (5,7) for the
-- (exp) group. (_PackedString indices start from 0)
type GroupBounds = (Int, Int)
re_compile_pattern
:: _PackedString -- pattern to compile
-> Bool -- True <=> assume single-line mode
-> Bool -- True <=> case-insensitive
-> PrimIO PatBuffer
re_match :: PatBuffer -- compiled regexp
-> _PackedString -- string to match
-> Int -- start position
-> Bool -- True <=> record results in registers
-> PrimIO (Maybe REmatch)
-- Matching on 2 strings is useful when you're dealing with multiple
-- buffers, which is something that could prove useful for
-- PackedStrings, as we don't want to stuff the contents of a file
-- into one massive heap chunk, but load (smaller chunks) on demand.
re_match2 :: PatBuffer -- 2-string version
-> _PackedString
-> _PackedString
-> Int
-> Int
-> Bool
-> PrimIO (Maybe REmatch)
re_search :: PatBuffer -- compiled regexp
-> _PackedString -- string to search
-> Int -- start index
-> Int -- stop index
-> Bool -- True <=> record results in registers
-> PrimIO (Maybe REmatch)
re_search2 :: PatBuffer -- Double buffer search
-> _PackedString
-> _PackedString
-> Int -- start index
-> Int -- range (?)
-> Int -- stop index
-> Bool -- True <=> results in registers
-> PrimIO (Maybe REmatch)
The `MatchPS' module provides Perl-like "higher-level" facilities to operate on `_PackedStrings'. The regular expressions in question are in Perl syntax. The "flags" on various functions can include: `i' for case-insensitive, `s' for single-line mode, and `g' for global. (It's probably worth your time to peruse the source code...)
matchPS :: _PackedString -- regexp
-> _PackedString -- string to match
-> [Char] -- flags
-> Maybe REmatch -- info about what matched and where
searchPS :: _PackedString -- regexp
-> _PackedString -- string to match
-> [Char] -- flags
-> Maybe REmatch
-- Perl-like match-and-substitute:
substPS :: _PackedString -- regexp
-> _PackedString -- replacement
-> [Char] -- flags
-> _PackedString -- string
-> _PackedString
-- same as substPS, but no prefix and suffix:
replacePS :: _PackedString -- regexp
-> _PackedString -- replacement
-> [Char] -- flags
-> _PackedString -- string
-> _PackedString
match2PS :: _PackedString -- regexp
-> _PackedString -- string1 to match
-> _PackedString -- string2 to match
-> [Char] -- flags
-> Maybe REmatch
search2PS :: _PackedString -- regexp
-> _PackedString -- string to match
-> _PackedString -- string to match
-> [Char] -- flags
-> Maybe REmatch
-- functions to pull the matched pieces out of an REmatch:
getMatchesNo :: REmatch -> Int
getMatchedGroup :: REmatch -> Int -> _PackedString -> _PackedString
getWholeMatch :: REmatch -> _PackedString -> _PackedString
getLastMatch :: REmatch -> _PackedString -> _PackedString
getAfterMatch :: REmatch -> _PackedString -> _PackedString
-- (reverse) brute-force string matching;
-- Perl equivalent is index/rindex:
findPS, rfindPS :: _PackedString -> _PackedString -> Maybe Int
-- Equivalent to Perl "chop" (off the last character, if any):
chopPS :: _PackedString -> _PackedString
-- matchPrefixPS: tries to match as much as possible of strA starting
-- from the beginning of strB (handy when matching fancy literals in
-- parsers):
matchPrefixPS :: _PackedString -> _PackedString -> Int