7.2. The Regex and MatchPS interfaces

(Sigbjorn Finne supplied the regular-expressions interface.)

The Regex library provides quite direct interface to the GNU regular-expression library, for doing manipulation on PackedStrings. You probably need to see the GNU documentation if you are operating at this level. Alternatively, you can use the simpler and higher-level RegexString interface.

The datatypes and functions that Regex provides are:
data PatBuffer  # just a bunch of bytes (mutable)

data REmatch
 = REmatch (Array Int GroupBounds)  -- for $1, ... $n
           GroupBounds              -- for $` (everything before match)
           GroupBounds              -- for $& (entire matched string)
           GroupBounds              -- for $' (everything after)
           GroupBounds              -- for $+ (matched by last bracket)

-- GroupBounds hold the interval where a group
-- matched inside a string, e.g.
--
-- matching "reg(exp)" "a regexp" returns the pair (5,7) for the
-- (exp) group. (PackedString indices start from 0)

type GroupBounds = (Int, Int)

re_compile_pattern
        :: PackedString         -- pattern to compile
        -> Bool                 -- True <=> assume single-line mode
        -> Bool                 -- True <=> case-insensitive
        -> PrimIO PatBuffer

re_match :: PatBuffer           -- compiled regexp
         -> PackedString        -- string to match
         -> Int                 -- start position
         -> Bool                -- True <=> record results in registers
         -> PrimIO (Maybe REmatch)

-- Matching on 2 strings is useful when you're dealing with multiple
-- buffers, which is something that could prove useful for
-- PackedStrings, as we don't want to stuff the contents of a file
-- into one massive heap chunk, but load (smaller chunks) on demand.

re_match2 :: PatBuffer          -- 2-string version
          -> PackedString
          -> PackedString
          -> Int
          -> Int
          -> Bool
          -> PrimIO (Maybe REmatch)

re_search :: PatBuffer          -- compiled regexp
          -> PackedString       -- string to search
          -> Int                -- start index
          -> Int                -- stop index
          -> Bool               -- True <=> record results in registers
          -> PrimIO (Maybe REmatch)

re_search2 :: PatBuffer         -- Double buffer search
           -> PackedString
           -> PackedString
           -> Int               -- start index
           -> Int               -- range (?)
           -> Int               -- stop index
           -> Bool              -- True <=> results in registers
           -> PrimIO (Maybe REmatch)

The MatchPS module provides Perl-like ``higher-level'' facilities to operate on PackedStrings. The regular expressions in question are in Perl syntax. The ``flags'' on various functions can include: i for case-insensitive, s for single-line mode, and g for global. (It's probably worth your time to peruse the source code…)

matchPS :: PackedString    -- regexp
        -> PackedString    -- string to match
        -> [Char]          -- flags
        -> Maybe REmatch   -- info about what matched and where

searchPS :: PackedString    -- regexp
         -> PackedString    -- string to match
         -> [Char]          -- flags
         -> Maybe REmatch

-- Perl-like match-and-substitute:
substPS :: PackedString     -- regexp
        -> PackedString     -- replacement
        -> [Char]           -- flags
        -> PackedString     -- string
        -> PackedString

-- same as substPS, but no prefix and suffix:
replacePS :: PackedString  -- regexp
          -> PackedString  -- replacement
          -> [Char]        -- flags
          -> PackedString  -- string
          -> PackedString

match2PS :: PackedString   -- regexp
         -> PackedString   -- string1 to match
         -> PackedString   -- string2 to match
         -> [Char]         -- flags
         -> Maybe REmatch

search2PS :: PackedString  -- regexp
          -> PackedString  -- string to match
          -> PackedString  -- string to match
          -> [Char]        -- flags
          -> Maybe REmatch

-- functions to pull the matched pieces out of an REmatch:

getMatchesNo    :: REmatch -> Int
getMatchedGroup :: REmatch -> Int -> PackedString -> PackedString
getWholeMatch   :: REmatch -> PackedString -> PackedString
getLastMatch    :: REmatch -> PackedString -> PackedString
getAfterMatch   :: REmatch -> PackedString -> PackedString

-- (reverse) brute-force string matching;
-- Perl equivalent is index/rindex:
findPS, rfindPS :: PackedString -> PackedString -> Maybe Int

-- Equivalent to Perl "chop" (off the last character, if any):
chopPS :: PackedString -> PackedString

-- matchPrefixPS: tries to match as much as possible of strA starting
-- from the beginning of strB (handy when matching fancy literals in
-- parsers):
matchPrefixPS :: PackedString -> PackedString -> Int