Class Regex

All Implemented Interfaces:
PointerInterface

public class Regex extends Record
The g_regex_*() functions implement regular
expression pattern matching using syntax and semantics similar to
Perl regular expression.

Some functions accept a @start_position argument, setting it differs
from just passing over a shortened string and setting %G_REGEX_MATCH_NOTBOL
in the case of a pattern that begins with any kind of lookbehind assertion.
For example, consider the pattern "\Biss\B" which finds occurrences of "iss"
in the middle of words. ("\B" matches only if the current position in the
subject is not a word boundary.) When applied to the string "Mississipi"
from the fourth byte, namely "issipi", it does not match, because "\B" is
always false at the start of the subject, which is deemed to be a word
boundary. However, if the entire string is passed , but with
@start_position set to 4, it finds the second occurrence of "iss" because
it is able to look behind the starting point to discover that it is
preceded by a letter.

Note that, unless you set the %G_REGEX_RAW flag, all the strings passed
to these functions must be encoded in UTF-8. The lengths and the positions
inside the strings are in bytes and not in characters, so, for instance,
"\xc3\xa0" (i.e. "à") is two bytes long but it is treated as a
single character. If you set %G_REGEX_RAW the strings can be non-valid
UTF-8 strings and a byte is treated as a character, so "\xc3\xa0" is two
bytes and two characters long.

When matching a pattern, "\n" matches only against a "\n" character in
the string, and "\r" matches only a "\r" character. To match any newline
sequence use "\R". This particular group matches either the two-character
sequence CR + LF ("\r\n"), or one of the single characters LF (linefeed,
U+000A, "\n"), VT vertical tab, U+000B, "\v"), FF (formfeed, U+000C, "\f"),
CR (carriage return, U+000D, "\r"), NEL (next line, U+0085), LS (line
separator, U+2028), or PS (paragraph separator, U+2029).

The behaviour of the dot, circumflex, and dollar metacharacters are
affected by newline characters, the default is to recognize any newline
character (the same characters recognized by "\R"). This can be changed
with %G_REGEX_NEWLINE_CR, %G_REGEX_NEWLINE_LF and %G_REGEX_NEWLINE_CRLF
compile options, and with %G_REGEX_MATCH_NEWLINE_ANY,
%G_REGEX_MATCH_NEWLINE_CR, %G_REGEX_MATCH_NEWLINE_LF and
%G_REGEX_MATCH_NEWLINE_CRLF match options. These settings are also
relevant when compiling a pattern if %G_REGEX_EXTENDED is set, and an
unescaped "#" outside a character class is encountered. This indicates
a comment that lasts until after the next newline.

When setting the %G_REGEX_JAVASCRIPT_COMPAT flag, pattern syntax and pattern
matching is changed to be compatible with the way that regular expressions
work in JavaScript. More precisely, a lonely ']' character in the pattern
is a syntax error; the '\x' escape only allows 0 to 2 hexadecimal digits, and
you must use the '\u' escape sequence with 4 hex digits to specify a unicode
codepoint instead of '\x' or 'x{....}'. If '\x' or '\u' are not followed by
the specified number of hex digits, they match 'x' and 'u' literally; also
'\U' always matches 'U' instead of being an error in the pattern. Finally,
pattern matching is modified so that back references to an unset subpattern
group produces a match with the empty string instead of an error. See
pcreapi(3) for more information.

Creating and manipulating the same #GRegex structure from different
threads is not a problem as #GRegex does not modify its internal
state between creation and destruction, on the other hand #GMatchInfo
is not threadsafe.

The regular expressions low-level functionalities are obtained through
the excellent
[PCRE](http://www.pcre.org/)
library written by Philip Hazel.

https://docs.gtk.org/glib/struct.Regex.html

  • Constructor Details

    • Regex

      public Regex(PointerContainer pointer)
    • Regex

      public Regex(@Nonnull Str pattern, int compile_options, int match_options)
      Compiles the regular expression to an internal form, and does
      the initial setup of the #GRegex structure.
      Parameters:
      pattern - the regular expression
      compile_options - compile options for the regular expression, or 0
      match_options - match options for the regular expression, or 0
    • Regex

      public Regex(String pattern, int compile_options, int match_options)
      Compiles the regular expression to an internal form, and does
      the initial setup of the #GRegex structure.
      Parameters:
      pattern - the regular expression
      compile_options - compile options for the regular expression, or 0
      match_options - match options for the regular expression, or 0
  • Method Details

    • getClassHandler

      public static ClassHandler getClassHandler()
    • getCaptureCount

      public int getCaptureCount()
      Returns the number of capturing subpatterns in the pattern.
      Returns:
      the number of capturing subpatterns
    • getCompileFlags

      public int getCompileFlags()
      Returns the compile options that @regex was created with.

      Depending on the version of PCRE that is used, this may or may not
      include flags set by option expressions such as `(?i)` found at the
      top-level within the compiled pattern.
      Returns:
      flags from #GRegexCompileFlags
    • getHasCrOrLf

      public boolean getHasCrOrLf()
      Checks whether the pattern contains explicit CR or LF references.
      Returns:
      %TRUE if the pattern contains explicit CR or LF references
    • getMatchFlags

      public int getMatchFlags()
      Returns the match options that @regex was created with.
      Returns:
      flags from #GRegexMatchFlags
    • getMaxBackref

      public int getMaxBackref()
      Returns the number of the highest back reference
      in the pattern, or 0 if the pattern does not contain
      back references.
      Returns:
      the number of the highest back reference
    • getMaxLookbehind

      public int getMaxLookbehind()
      Gets the number of characters in the longest lookbehind assertion in the
      pattern. This information is useful when doing multi-segment matching using
      the partial matching facilities.
      Returns:
      the number of characters in the longest lookbehind assertion.
    • getPattern

      public Str getPattern()
      Gets the pattern string associated with @regex, i.e. a copy of
      the string passed to g_regex_new().
      Returns:
      the pattern of @regex
    • getStringNumber

      public int getStringNumber(@Nonnull Str name)
      Retrieves the number of the subexpression named @name.
      Parameters:
      name - name of the subexpression
      Returns:
      The number of the subexpression or -1 if @name does not exists
    • getStringNumber

      public int getStringNumber(String name)
      Retrieves the number of the subexpression named @name.
      Parameters:
      name - name of the subexpression
      Returns:
      The number of the subexpression or -1 if @name does not exists
    • ref

      public Regex ref()
      Increases reference count of @regex by 1.
      Returns:
      @regex
    • replace

      public Str replace(@Nonnull Str string, long string_len, int start_position, @Nonnull Str replacement, int match_options) throws AllocationError
      Replaces all occurrences of the pattern in @regex with the
      replacement text. Backreferences of the form '\number' or
      '\g<number>' in the replacement text are interpolated by the
      number-th captured subexpression of the match, '\g<name>' refers
      to the captured subexpression with the given name. '\0' refers
      to the complete match, but '\0' followed by a number is the octal
      representation of a character. To include a literal '\' in the
      replacement, write '\\\\'.

      There are also escapes that changes the case of the following text:

      - \l: Convert to lower case the next character
      - \u: Convert to upper case the next character
      - \L: Convert to lower case till \E
      - \U: Convert to upper case till \E
      - \E: End case modification

      If you do not need to use backreferences use g_regex_replace_literal().

      The @replacement string must be UTF-8 encoded even if %G_REGEX_RAW was
      passed to g_regex_new(). If you want to use not UTF-8 encoded strings
      you can use g_regex_replace_literal().

      Setting @start_position differs from just passing over a shortened
      string and setting %G_REGEX_MATCH_NOTBOL in the case of a pattern that
      begins with any kind of lookbehind assertion, such as "\b".
      Parameters:
      string - the string to perform matches against
      string_len - the length of @string, in bytes, or -1 if @string is nul-terminated
      start_position - starting index of the string to match, in bytes
      replacement - text to replace each match with
      match_options - options for the match
      Returns:
      a newly allocated string containing the replacements
      Throws:
      AllocationError
    • replace

      public Str replace(String string, long string_len, int start_position, String replacement, int match_options) throws AllocationError
      Replaces all occurrences of the pattern in @regex with the
      replacement text. Backreferences of the form '\number' or
      '\g<number>' in the replacement text are interpolated by the
      number-th captured subexpression of the match, '\g<name>' refers
      to the captured subexpression with the given name. '\0' refers
      to the complete match, but '\0' followed by a number is the octal
      representation of a character. To include a literal '\' in the
      replacement, write '\\\\'.

      There are also escapes that changes the case of the following text:

      - \l: Convert to lower case the next character
      - \u: Convert to upper case the next character
      - \L: Convert to lower case till \E
      - \U: Convert to upper case till \E
      - \E: End case modification

      If you do not need to use backreferences use g_regex_replace_literal().

      The @replacement string must be UTF-8 encoded even if %G_REGEX_RAW was
      passed to g_regex_new(). If you want to use not UTF-8 encoded strings
      you can use g_regex_replace_literal().

      Setting @start_position differs from just passing over a shortened
      string and setting %G_REGEX_MATCH_NOTBOL in the case of a pattern that
      begins with any kind of lookbehind assertion, such as "\b".
      Parameters:
      string - the string to perform matches against
      string_len - the length of @string, in bytes, or -1 if @string is nul-terminated
      start_position - starting index of the string to match, in bytes
      replacement - text to replace each match with
      match_options - options for the match
      Returns:
      a newly allocated string containing the replacements
      Throws:
      AllocationError
    • replaceEval

      public Str replaceEval(@Nonnull Str string, long string_len, int start_position, int match_options, Regex.OnRegexEvalCallback eval, @Nullable Pointer user_data) throws AllocationError
      Replaces occurrences of the pattern in regex with the output of
      @eval for that occurrence.

      Setting @start_position differs from just passing over a shortened
      string and setting %G_REGEX_MATCH_NOTBOL in the case of a pattern
      that begins with any kind of lookbehind assertion, such as "\b".

      The following example uses g_regex_replace_eval() to replace multiple
      strings at once:
      <!-- language="C" -->
       static gboolean
       eval_cb (const GMatchInfo *info,
                GString          *res,
                gpointer          data)
       {
         gchar *match;
         gchar *r;
       
          match = g_match_info_fetch (info, 0);
          r = g_hash_table_lookup ((GHashTable *)data, match);
          g_string_append (res, r);
          g_free (match);
       
          return FALSE;
       }
       
       ...
       
       GRegex *reg;
       GHashTable *h;
       gchar *res;
       
       h = g_hash_table_new (g_str_hash, g_str_equal);
       
       g_hash_table_insert (h, "1", "ONE");
       g_hash_table_insert (h, "2", "TWO");
       g_hash_table_insert (h, "3", "THREE");
       g_hash_table_insert (h, "4", "FOUR");
       
       reg = g_regex_new ("1|2|3|4", G_REGEX_DEFAULT, G_REGEX_MATCH_DEFAULT, NULL);
       res = g_regex_replace_eval (reg, text, -1, 0, 0, eval_cb, h, NULL);
       g_hash_table_destroy (h);
       
       ...
       
      Parameters:
      string - string to perform matches against
      string_len - the length of @string, in bytes, or -1 if @string is nul-terminated
      start_position - starting index of the string to match, in bytes
      match_options - options for the match
      eval - a function to call for each match
      user_data - user data to pass to the function
      Returns:
      a newly allocated string containing the replacements
      Throws:
      AllocationError
    • replaceEval

      public Str replaceEval(String string, long string_len, int start_position, int match_options, Regex.OnRegexEvalCallback eval, @Nullable Pointer user_data) throws AllocationError
      Replaces occurrences of the pattern in regex with the output of
      @eval for that occurrence.

      Setting @start_position differs from just passing over a shortened
      string and setting %G_REGEX_MATCH_NOTBOL in the case of a pattern
      that begins with any kind of lookbehind assertion, such as "\b".

      The following example uses g_regex_replace_eval() to replace multiple
      strings at once:
      <!-- language="C" -->
       static gboolean
       eval_cb (const GMatchInfo *info,
                GString          *res,
                gpointer          data)
       {
         gchar *match;
         gchar *r;
       
          match = g_match_info_fetch (info, 0);
          r = g_hash_table_lookup ((GHashTable *)data, match);
          g_string_append (res, r);
          g_free (match);
       
          return FALSE;
       }
       
       ...
       
       GRegex *reg;
       GHashTable *h;
       gchar *res;
       
       h = g_hash_table_new (g_str_hash, g_str_equal);
       
       g_hash_table_insert (h, "1", "ONE");
       g_hash_table_insert (h, "2", "TWO");
       g_hash_table_insert (h, "3", "THREE");
       g_hash_table_insert (h, "4", "FOUR");
       
       reg = g_regex_new ("1|2|3|4", G_REGEX_DEFAULT, G_REGEX_MATCH_DEFAULT, NULL);
       res = g_regex_replace_eval (reg, text, -1, 0, 0, eval_cb, h, NULL);
       g_hash_table_destroy (h);
       
       ...
       
      Parameters:
      string - string to perform matches against
      string_len - the length of @string, in bytes, or -1 if @string is nul-terminated
      start_position - starting index of the string to match, in bytes
      match_options - options for the match
      eval - a function to call for each match
      user_data - user data to pass to the function
      Returns:
      a newly allocated string containing the replacements
      Throws:
      AllocationError
    • replaceLiteral

      public Str replaceLiteral(@Nonnull Str string, long string_len, int start_position, @Nonnull Str replacement, int match_options) throws AllocationError
      Replaces all occurrences of the pattern in @regex with the
      replacement text. @replacement is replaced literally, to
      include backreferences use g_regex_replace().

      Setting @start_position differs from just passing over a
      shortened string and setting %G_REGEX_MATCH_NOTBOL in the
      case of a pattern that begins with any kind of lookbehind
      assertion, such as "\b".
      Parameters:
      string - the string to perform matches against
      string_len - the length of @string, in bytes, or -1 if @string is nul-terminated
      start_position - starting index of the string to match, in bytes
      replacement - text to replace each match with
      match_options - options for the match
      Returns:
      a newly allocated string containing the replacements
      Throws:
      AllocationError
    • replaceLiteral

      public Str replaceLiteral(String string, long string_len, int start_position, String replacement, int match_options) throws AllocationError
      Replaces all occurrences of the pattern in @regex with the
      replacement text. @replacement is replaced literally, to
      include backreferences use g_regex_replace().

      Setting @start_position differs from just passing over a
      shortened string and setting %G_REGEX_MATCH_NOTBOL in the
      case of a pattern that begins with any kind of lookbehind
      assertion, such as "\b".
      Parameters:
      string - the string to perform matches against
      string_len - the length of @string, in bytes, or -1 if @string is nul-terminated
      start_position - starting index of the string to match, in bytes
      replacement - text to replace each match with
      match_options - options for the match
      Returns:
      a newly allocated string containing the replacements
      Throws:
      AllocationError
    • unref

      public void unref()
      Decreases reference count of @regex by 1. When reference count drops
      to zero, it frees all the memory associated with the regex structure.
    • checkReplacement

      public static boolean checkReplacement(@Nonnull Str replacement, @Nullable Int has_references) throws AllocationError
      Checks whether @replacement is a valid replacement string
      (see g_regex_replace()), i.e. that all escape sequences in
      it are valid.

      If @has_references is not %NULL then @replacement is checked
      for pattern references. For instance, replacement text 'foo\n'
      does not contain references and may be evaluated without information
      about actual match, but '\0\1' (whole match followed by first
      subpattern) requires valid #GMatchInfo object.
      Parameters:
      replacement - the replacement string
      has_references - location to store information about references in @replacement or %NULL
      Returns:
      whether @replacement is a valid replacement string
      Throws:
      AllocationError
    • errorQuark

      public static int errorQuark()
      Returns:
    • escapeNul

      public static Str escapeNul(@Nonnull Str string, int length)
      Escapes the nul characters in @string to "\x00". It can be used
      to compile a regex with embedded nul characters.

      For completeness, @length can be -1 for a nul-terminated string.
      In this case the output string will be of course equal to @string.
      Parameters:
      string - the string to escape
      length - the length of @string
      Returns:
      a newly-allocated escaped string
    • escapeString

      public static Str escapeString(@Nonnull Str string, int length)
      Escapes the special characters used for regular expressions
      in @string, for instance "a.b*c" becomes "a\.b\*c". This
      function is useful to dynamically generate regular expressions.

      @string can contain nul characters that are replaced with "\0",
      in this case remember to specify the correct length of @string
      in @length.
      Parameters:
      string - the string to escape
      length - the length of @string, in bytes, or -1 if @string is nul-terminated
      Returns:
      a newly-allocated escaped string
    • matchSimple

      public static boolean matchSimple(@Nonnull Str pattern, @Nonnull Str string, int compile_options, int match_options)
      Scans for a match in @string for @pattern.

      This function is equivalent to g_regex_match() but it does not
      require to compile the pattern with g_regex_new(), avoiding some
      lines of code when you need just to do a match without extracting
      substrings, capture counts, and so on.

      If this function is to be called on the same @pattern more than
      once, it's more efficient to compile the pattern once with
      g_regex_new() and then use g_regex_match().
      Parameters:
      pattern - the regular expression
      string - the string to scan for matches
      compile_options - compile options for the regular expression, or 0
      match_options - match options, or 0
      Returns:
      %TRUE if the string matched, %FALSE otherwise
    • getTypeID

      public static long getTypeID()
    • getParentTypeID

      public static long getParentTypeID()
    • getTypeSize

      public static TypeSystem.TypeSize getTypeSize()
    • getParentTypeSize

      public static TypeSystem.TypeSize getParentTypeSize()
    • getInstanceSize

      public static int getInstanceSize()