(Sigbjorn Finne supplied the regular-expressions interface.)
The Regex library provides quite direct interface to the GNU regular-expression library, for doing manipulation on PackedStrings. You probably need to see the GNU documentation if you are operating at this level. Alternatively, you can use the simpler and higher-level RegexString interface.
The datatypes and functions that Regex provides are:
data PatBuffer # just a bunch of bytes (mutable) data REmatch = REmatch (Array Int GroupBounds) -- for $1, ... $n GroupBounds -- for $` (everything before match) GroupBounds -- for $& (entire matched string) GroupBounds -- for $' (everything after) GroupBounds -- for $+ (matched by last bracket) -- GroupBounds hold the interval where a group -- matched inside a string, e.g. -- -- matching "reg(exp)" "a regexp" returns the pair (5,7) for the -- (exp) group. (PackedString indices start from 0) type GroupBounds = (Int, Int) re_compile_pattern :: PackedString -- pattern to compile -> Bool -- True <=> assume single-line mode -> Bool -- True <=> case-insensitive -> PrimIO PatBuffer re_match :: PatBuffer -- compiled regexp -> PackedString -- string to match -> Int -- start position -> Bool -- True <=> record results in registers -> PrimIO (Maybe REmatch) -- Matching on 2 strings is useful when you're dealing with multiple -- buffers, which is something that could prove useful for -- PackedStrings, as we don't want to stuff the contents of a file -- into one massive heap chunk, but load (smaller chunks) on demand. re_match2 :: PatBuffer -- 2-string version -> PackedString -> PackedString -> Int -> Int -> Bool -> PrimIO (Maybe REmatch) re_search :: PatBuffer -- compiled regexp -> PackedString -- string to search -> Int -- start index -> Int -- stop index -> Bool -- True <=> record results in registers -> PrimIO (Maybe REmatch) re_search2 :: PatBuffer -- Double buffer search -> PackedString -> PackedString -> Int -- start index -> Int -- range (?) -> Int -- stop index -> Bool -- True <=> results in registers -> PrimIO (Maybe REmatch) |
The MatchPS module provides Perl-like ``higher-level'' facilities to operate on PackedStrings. The regular expressions in question are in Perl syntax. The ``flags'' on various functions can include: i for case-insensitive, s for single-line mode, and g for global. (It's probably worth your time to peruse the source code…)
matchPS :: PackedString -- regexp -> PackedString -- string to match -> [Char] -- flags -> Maybe REmatch -- info about what matched and where searchPS :: PackedString -- regexp -> PackedString -- string to match -> [Char] -- flags -> Maybe REmatch -- Perl-like match-and-substitute: substPS :: PackedString -- regexp -> PackedString -- replacement -> [Char] -- flags -> PackedString -- string -> PackedString -- same as substPS, but no prefix and suffix: replacePS :: PackedString -- regexp -> PackedString -- replacement -> [Char] -- flags -> PackedString -- string -> PackedString match2PS :: PackedString -- regexp -> PackedString -- string1 to match -> PackedString -- string2 to match -> [Char] -- flags -> Maybe REmatch search2PS :: PackedString -- regexp -> PackedString -- string to match -> PackedString -- string to match -> [Char] -- flags -> Maybe REmatch -- functions to pull the matched pieces out of an REmatch: getMatchesNo :: REmatch -> Int getMatchedGroup :: REmatch -> Int -> PackedString -> PackedString getWholeMatch :: REmatch -> PackedString -> PackedString getLastMatch :: REmatch -> PackedString -> PackedString getAfterMatch :: REmatch -> PackedString -> PackedString -- (reverse) brute-force string matching; -- Perl equivalent is index/rindex: findPS, rfindPS :: PackedString -> PackedString -> Maybe Int -- Equivalent to Perl "chop" (off the last character, if any): chopPS :: PackedString -> PackedString -- matchPrefixPS: tries to match as much as possible of strA starting -- from the beginning of strB (handy when matching fancy literals in -- parsers): matchPrefixPS :: PackedString -> PackedString -> Int |