Package ch.bailu.gtk.glib
Class Regex
java.lang.Object
ch.bailu.gtk.type.Type
ch.bailu.gtk.type.Pointer
ch.bailu.gtk.type.Record
ch.bailu.gtk.glib.Regex
- All Implemented Interfaces:
PointerInterface
The g_regex_*() functions implement regular
expression pattern matching using syntax and semantics similar to
Perl regular expression.
Some functions accept a @start_position argument, setting it differs
from just passing over a shortened string and setting %G_REGEX_MATCH_NOTBOL
in the case of a pattern that begins with any kind of lookbehind assertion.
For example, consider the pattern "\Biss\B" which finds occurrences of "iss"
in the middle of words. ("\B" matches only if the current position in the
subject is not a word boundary.) When applied to the string "Mississipi"
from the fourth byte, namely "issipi", it does not match, because "\B" is
always false at the start of the subject, which is deemed to be a word
boundary. However, if the entire string is passed , but with
@start_position set to 4, it finds the second occurrence of "iss" because
it is able to look behind the starting point to discover that it is
preceded by a letter.
Note that, unless you set the %G_REGEX_RAW flag, all the strings passed
to these functions must be encoded in UTF-8. The lengths and the positions
inside the strings are in bytes and not in characters, so, for instance,
"\xc3\xa0" (i.e. "à") is two bytes long but it is treated as a
single character. If you set %G_REGEX_RAW the strings can be non-valid
UTF-8 strings and a byte is treated as a character, so "\xc3\xa0" is two
bytes and two characters long.
When matching a pattern, "\n" matches only against a "\n" character in
the string, and "\r" matches only a "\r" character. To match any newline
sequence use "\R". This particular group matches either the two-character
sequence CR + LF ("\r\n"), or one of the single characters LF (linefeed,
U+000A, "\n"), VT vertical tab, U+000B, "\v"), FF (formfeed, U+000C, "\f"),
CR (carriage return, U+000D, "\r"), NEL (next line, U+0085), LS (line
separator, U+2028), or PS (paragraph separator, U+2029).
The behaviour of the dot, circumflex, and dollar metacharacters are
affected by newline characters, the default is to recognize any newline
character (the same characters recognized by "\R"). This can be changed
with %G_REGEX_NEWLINE_CR, %G_REGEX_NEWLINE_LF and %G_REGEX_NEWLINE_CRLF
compile options, and with %G_REGEX_MATCH_NEWLINE_ANY,
%G_REGEX_MATCH_NEWLINE_CR, %G_REGEX_MATCH_NEWLINE_LF and
%G_REGEX_MATCH_NEWLINE_CRLF match options. These settings are also
relevant when compiling a pattern if %G_REGEX_EXTENDED is set, and an
unescaped "#" outside a character class is encountered. This indicates
a comment that lasts until after the next newline.
When setting the %G_REGEX_JAVASCRIPT_COMPAT flag, pattern syntax and pattern
matching is changed to be compatible with the way that regular expressions
work in JavaScript. More precisely, a lonely ']' character in the pattern
is a syntax error; the '\x' escape only allows 0 to 2 hexadecimal digits, and
you must use the '\u' escape sequence with 4 hex digits to specify a unicode
codepoint instead of '\x' or 'x{....}'. If '\x' or '\u' are not followed by
the specified number of hex digits, they match 'x' and 'u' literally; also
'\U' always matches 'U' instead of being an error in the pattern. Finally,
pattern matching is modified so that back references to an unset subpattern
group produces a match with the empty string instead of an error. See
pcreapi(3) for more information.
Creating and manipulating the same #GRegex structure from different
threads is not a problem as #GRegex does not modify its internal
state between creation and destruction, on the other hand #GMatchInfo
is not threadsafe.
The regular expressions low-level functionalities are obtained through
the excellent
[PCRE](http://www.pcre.org/)
library written by Philip Hazel.
expression pattern matching using syntax and semantics similar to
Perl regular expression.
Some functions accept a @start_position argument, setting it differs
from just passing over a shortened string and setting %G_REGEX_MATCH_NOTBOL
in the case of a pattern that begins with any kind of lookbehind assertion.
For example, consider the pattern "\Biss\B" which finds occurrences of "iss"
in the middle of words. ("\B" matches only if the current position in the
subject is not a word boundary.) When applied to the string "Mississipi"
from the fourth byte, namely "issipi", it does not match, because "\B" is
always false at the start of the subject, which is deemed to be a word
boundary. However, if the entire string is passed , but with
@start_position set to 4, it finds the second occurrence of "iss" because
it is able to look behind the starting point to discover that it is
preceded by a letter.
Note that, unless you set the %G_REGEX_RAW flag, all the strings passed
to these functions must be encoded in UTF-8. The lengths and the positions
inside the strings are in bytes and not in characters, so, for instance,
"\xc3\xa0" (i.e. "à") is two bytes long but it is treated as a
single character. If you set %G_REGEX_RAW the strings can be non-valid
UTF-8 strings and a byte is treated as a character, so "\xc3\xa0" is two
bytes and two characters long.
When matching a pattern, "\n" matches only against a "\n" character in
the string, and "\r" matches only a "\r" character. To match any newline
sequence use "\R". This particular group matches either the two-character
sequence CR + LF ("\r\n"), or one of the single characters LF (linefeed,
U+000A, "\n"), VT vertical tab, U+000B, "\v"), FF (formfeed, U+000C, "\f"),
CR (carriage return, U+000D, "\r"), NEL (next line, U+0085), LS (line
separator, U+2028), or PS (paragraph separator, U+2029).
The behaviour of the dot, circumflex, and dollar metacharacters are
affected by newline characters, the default is to recognize any newline
character (the same characters recognized by "\R"). This can be changed
with %G_REGEX_NEWLINE_CR, %G_REGEX_NEWLINE_LF and %G_REGEX_NEWLINE_CRLF
compile options, and with %G_REGEX_MATCH_NEWLINE_ANY,
%G_REGEX_MATCH_NEWLINE_CR, %G_REGEX_MATCH_NEWLINE_LF and
%G_REGEX_MATCH_NEWLINE_CRLF match options. These settings are also
relevant when compiling a pattern if %G_REGEX_EXTENDED is set, and an
unescaped "#" outside a character class is encountered. This indicates
a comment that lasts until after the next newline.
When setting the %G_REGEX_JAVASCRIPT_COMPAT flag, pattern syntax and pattern
matching is changed to be compatible with the way that regular expressions
work in JavaScript. More precisely, a lonely ']' character in the pattern
is a syntax error; the '\x' escape only allows 0 to 2 hexadecimal digits, and
you must use the '\u' escape sequence with 4 hex digits to specify a unicode
codepoint instead of '\x' or 'x{....}'. If '\x' or '\u' are not followed by
the specified number of hex digits, they match 'x' and 'u' literally; also
'\U' always matches 'U' instead of being an error in the pattern. Finally,
pattern matching is modified so that back references to an unset subpattern
group produces a match with the empty string instead of an error. See
pcreapi(3) for more information.
Creating and manipulating the same #GRegex structure from different
threads is not a problem as #GRegex does not modify its internal
state between creation and destruction, on the other hand #GMatchInfo
is not threadsafe.
The regular expressions low-level functionalities are obtained through
the excellent
[PCRE](http://www.pcre.org/)
library written by Philip Hazel.
-
Nested Class Summary
-
Field Summary
-
Constructor Summary
ConstructorDescriptionRegex
(PointerContainer pointer) Compiles the regular expression to an internal form, and does
the initial setup of the #GRegex structure.Compiles the regular expression to an internal form, and does
the initial setup of the #GRegex structure. -
Method Summary
Modifier and TypeMethodDescriptionstatic boolean
checkReplacement
(Str replacement, Int has_references) Checks whether @replacement is a valid replacement string
(see g_regex_replace()), i.e. that all escape sequences in
it are valid.static int
static Str
Escapes the nul characters in @string to "\x00".static Str
escapeString
(Str string, int length) Escapes the special characters used for regular expressions
in @string, for instance "a.b*c" becomes "a\.b\*c".int
Returns the number of capturing subpatterns in the pattern.static ClassHandler
int
Returns the compile options that @regex was created with.boolean
Checks whether the pattern contains explicit CR or LF references.static int
int
Returns the match options that @regex was created with.int
Returns the number of the highest back reference
in the pattern, or 0 if the pattern does not contain
back references.int
Gets the number of characters in the longest lookbehind assertion in the
pattern.static long
static TypeSystem.TypeSize
Gets the pattern string associated with @regex, i.e. a copy of
the string passed to g_regex_new().int
getStringNumber
(Str name) Retrieves the number of the subexpression named @name.int
getStringNumber
(String name) Retrieves the number of the subexpression named @name.static long
static TypeSystem.TypeSize
static boolean
matchSimple
(Str pattern, Str string, int compile_options, int match_options) Scans for a match in @string for @pattern.ref()
Increases reference count of @regex by 1.Replaces all occurrences of the pattern in @regex with the
replacement text.Replaces all occurrences of the pattern in @regex with the
replacement text.replaceEval
(Str string, long string_len, int start_position, int match_options, Regex.OnRegexEvalCallback eval, Pointer user_data) Replaces occurrences of the pattern in regex with the output of
@eval for that occurrence.replaceEval
(String string, long string_len, int start_position, int match_options, Regex.OnRegexEvalCallback eval, Pointer user_data) Replaces occurrences of the pattern in regex with the output of
@eval for that occurrence.replaceLiteral
(Str string, long string_len, int start_position, Str replacement, int match_options) Replaces all occurrences of the pattern in @regex with the
replacement text.replaceLiteral
(String string, long string_len, int start_position, String replacement, int match_options) Replaces all occurrences of the pattern in @regex with the
replacement text.void
unref()
Decreases reference count of @regex by 1.Methods inherited from class ch.bailu.gtk.type.Pointer
asCPointer, cast, connectSignal, disconnectSignals, disconnectSignals, equals, hashCode, throwIfNull, throwNullPointerException, toString, unregisterCallbacks, unregisterCallbacks
Methods inherited from class ch.bailu.gtk.type.Type
asCPointer, asCPointer, asCPointerNotNull, asJnaPointer, asJnaPointer, asPointer, asPointer, cast, cast, throwIfNull
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface ch.bailu.gtk.type.PointerInterface
asCPointerNotNull, asJnaPointer, asPointer, isNotNull, isNull
-
Constructor Details
-
Regex
-
Regex
Compiles the regular expression to an internal form, and does
the initial setup of the #GRegex structure.- Parameters:
pattern
- the regular expressioncompile_options
- compile options for the regular expression, or 0match_options
- match options for the regular expression, or 0
-
Regex
Compiles the regular expression to an internal form, and does
the initial setup of the #GRegex structure.- Parameters:
pattern
- the regular expressioncompile_options
- compile options for the regular expression, or 0match_options
- match options for the regular expression, or 0
-
-
Method Details
-
getClassHandler
-
getCaptureCount
public int getCaptureCount()Returns the number of capturing subpatterns in the pattern.- Returns:
- the number of capturing subpatterns
-
getCompileFlags
public int getCompileFlags()Returns the compile options that @regex was created with.
Depending on the version of PCRE that is used, this may or may not
include flags set by option expressions such as `(?i)` found at the
top-level within the compiled pattern.- Returns:
- flags from #GRegexCompileFlags
-
getHasCrOrLf
public boolean getHasCrOrLf()Checks whether the pattern contains explicit CR or LF references.- Returns:
- %TRUE if the pattern contains explicit CR or LF references
-
getMatchFlags
public int getMatchFlags()Returns the match options that @regex was created with.- Returns:
- flags from #GRegexMatchFlags
-
getMaxBackref
public int getMaxBackref()Returns the number of the highest back reference
in the pattern, or 0 if the pattern does not contain
back references.- Returns:
- the number of the highest back reference
-
getMaxLookbehind
public int getMaxLookbehind()Gets the number of characters in the longest lookbehind assertion in the
pattern. This information is useful when doing multi-segment matching using
the partial matching facilities.- Returns:
- the number of characters in the longest lookbehind assertion.
-
getPattern
Gets the pattern string associated with @regex, i.e. a copy of
the string passed to g_regex_new().- Returns:
- the pattern of @regex
-
getStringNumber
Retrieves the number of the subexpression named @name.- Parameters:
name
- name of the subexpression- Returns:
- The number of the subexpression or -1 if @name does not exists
-
getStringNumber
Retrieves the number of the subexpression named @name.- Parameters:
name
- name of the subexpression- Returns:
- The number of the subexpression or -1 if @name does not exists
-
ref
Increases reference count of @regex by 1.- Returns:
- @regex
-
replace
public Str replace(@Nonnull Str string, long string_len, int start_position, @Nonnull Str replacement, int match_options) throws AllocationError Replaces all occurrences of the pattern in @regex with the
replacement text. Backreferences of the form '\number' or
'\g<number>' in the replacement text are interpolated by the
number-th captured subexpression of the match, '\g<name>' refers
to the captured subexpression with the given name. '\0' refers
to the complete match, but '\0' followed by a number is the octal
representation of a character. To include a literal '\' in the
replacement, write '\\\\'.
There are also escapes that changes the case of the following text:
- \l: Convert to lower case the next character
- \u: Convert to upper case the next character
- \L: Convert to lower case till \E
- \U: Convert to upper case till \E
- \E: End case modification
If you do not need to use backreferences use g_regex_replace_literal().
The @replacement string must be UTF-8 encoded even if %G_REGEX_RAW was
passed to g_regex_new(). If you want to use not UTF-8 encoded strings
you can use g_regex_replace_literal().
Setting @start_position differs from just passing over a shortened
string and setting %G_REGEX_MATCH_NOTBOL in the case of a pattern that
begins with any kind of lookbehind assertion, such as "\b".- Parameters:
string
- the string to perform matches againststring_len
- the length of @string, in bytes, or -1 if @string is nul-terminatedstart_position
- starting index of the string to match, in bytesreplacement
- text to replace each match withmatch_options
- options for the match- Returns:
- a newly allocated string containing the replacements
- Throws:
AllocationError
-
replace
public Str replace(String string, long string_len, int start_position, String replacement, int match_options) throws AllocationError Replaces all occurrences of the pattern in @regex with the
replacement text. Backreferences of the form '\number' or
'\g<number>' in the replacement text are interpolated by the
number-th captured subexpression of the match, '\g<name>' refers
to the captured subexpression with the given name. '\0' refers
to the complete match, but '\0' followed by a number is the octal
representation of a character. To include a literal '\' in the
replacement, write '\\\\'.
There are also escapes that changes the case of the following text:
- \l: Convert to lower case the next character
- \u: Convert to upper case the next character
- \L: Convert to lower case till \E
- \U: Convert to upper case till \E
- \E: End case modification
If you do not need to use backreferences use g_regex_replace_literal().
The @replacement string must be UTF-8 encoded even if %G_REGEX_RAW was
passed to g_regex_new(). If you want to use not UTF-8 encoded strings
you can use g_regex_replace_literal().
Setting @start_position differs from just passing over a shortened
string and setting %G_REGEX_MATCH_NOTBOL in the case of a pattern that
begins with any kind of lookbehind assertion, such as "\b".- Parameters:
string
- the string to perform matches againststring_len
- the length of @string, in bytes, or -1 if @string is nul-terminatedstart_position
- starting index of the string to match, in bytesreplacement
- text to replace each match withmatch_options
- options for the match- Returns:
- a newly allocated string containing the replacements
- Throws:
AllocationError
-
replaceEval
public Str replaceEval(@Nonnull Str string, long string_len, int start_position, int match_options, Regex.OnRegexEvalCallback eval, @Nullable Pointer user_data) throws AllocationError Replaces occurrences of the pattern in regex with the output of
@eval for that occurrence.
Setting @start_position differs from just passing over a shortened
string and setting %G_REGEX_MATCH_NOTBOL in the case of a pattern
that begins with any kind of lookbehind assertion, such as "\b".
The following example uses g_regex_replace_eval() to replace multiple
strings at once:<!-- language="C" --> static gboolean eval_cb (const GMatchInfo *info, GString *res, gpointer data) { gchar *match; gchar *r; match = g_match_info_fetch (info, 0); r = g_hash_table_lookup ((GHashTable *)data, match); g_string_append (res, r); g_free (match); return FALSE; } ... GRegex *reg; GHashTable *h; gchar *res; h = g_hash_table_new (g_str_hash, g_str_equal); g_hash_table_insert (h, "1", "ONE"); g_hash_table_insert (h, "2", "TWO"); g_hash_table_insert (h, "3", "THREE"); g_hash_table_insert (h, "4", "FOUR"); reg = g_regex_new ("1|2|3|4", G_REGEX_DEFAULT, G_REGEX_MATCH_DEFAULT, NULL); res = g_regex_replace_eval (reg, text, -1, 0, 0, eval_cb, h, NULL); g_hash_table_destroy (h); ...
- Parameters:
string
- string to perform matches againststring_len
- the length of @string, in bytes, or -1 if @string is nul-terminatedstart_position
- starting index of the string to match, in bytesmatch_options
- options for the matcheval
- a function to call for each matchuser_data
- user data to pass to the function- Returns:
- a newly allocated string containing the replacements
- Throws:
AllocationError
-
replaceEval
public Str replaceEval(String string, long string_len, int start_position, int match_options, Regex.OnRegexEvalCallback eval, @Nullable Pointer user_data) throws AllocationError Replaces occurrences of the pattern in regex with the output of
@eval for that occurrence.
Setting @start_position differs from just passing over a shortened
string and setting %G_REGEX_MATCH_NOTBOL in the case of a pattern
that begins with any kind of lookbehind assertion, such as "\b".
The following example uses g_regex_replace_eval() to replace multiple
strings at once:<!-- language="C" --> static gboolean eval_cb (const GMatchInfo *info, GString *res, gpointer data) { gchar *match; gchar *r; match = g_match_info_fetch (info, 0); r = g_hash_table_lookup ((GHashTable *)data, match); g_string_append (res, r); g_free (match); return FALSE; } ... GRegex *reg; GHashTable *h; gchar *res; h = g_hash_table_new (g_str_hash, g_str_equal); g_hash_table_insert (h, "1", "ONE"); g_hash_table_insert (h, "2", "TWO"); g_hash_table_insert (h, "3", "THREE"); g_hash_table_insert (h, "4", "FOUR"); reg = g_regex_new ("1|2|3|4", G_REGEX_DEFAULT, G_REGEX_MATCH_DEFAULT, NULL); res = g_regex_replace_eval (reg, text, -1, 0, 0, eval_cb, h, NULL); g_hash_table_destroy (h); ...
- Parameters:
string
- string to perform matches againststring_len
- the length of @string, in bytes, or -1 if @string is nul-terminatedstart_position
- starting index of the string to match, in bytesmatch_options
- options for the matcheval
- a function to call for each matchuser_data
- user data to pass to the function- Returns:
- a newly allocated string containing the replacements
- Throws:
AllocationError
-
replaceLiteral
public Str replaceLiteral(@Nonnull Str string, long string_len, int start_position, @Nonnull Str replacement, int match_options) throws AllocationError Replaces all occurrences of the pattern in @regex with the
replacement text. @replacement is replaced literally, to
include backreferences use g_regex_replace().
Setting @start_position differs from just passing over a
shortened string and setting %G_REGEX_MATCH_NOTBOL in the
case of a pattern that begins with any kind of lookbehind
assertion, such as "\b".- Parameters:
string
- the string to perform matches againststring_len
- the length of @string, in bytes, or -1 if @string is nul-terminatedstart_position
- starting index of the string to match, in bytesreplacement
- text to replace each match withmatch_options
- options for the match- Returns:
- a newly allocated string containing the replacements
- Throws:
AllocationError
-
replaceLiteral
public Str replaceLiteral(String string, long string_len, int start_position, String replacement, int match_options) throws AllocationError Replaces all occurrences of the pattern in @regex with the
replacement text. @replacement is replaced literally, to
include backreferences use g_regex_replace().
Setting @start_position differs from just passing over a
shortened string and setting %G_REGEX_MATCH_NOTBOL in the
case of a pattern that begins with any kind of lookbehind
assertion, such as "\b".- Parameters:
string
- the string to perform matches againststring_len
- the length of @string, in bytes, or -1 if @string is nul-terminatedstart_position
- starting index of the string to match, in bytesreplacement
- text to replace each match withmatch_options
- options for the match- Returns:
- a newly allocated string containing the replacements
- Throws:
AllocationError
-
unref
public void unref()Decreases reference count of @regex by 1. When reference count drops
to zero, it frees all the memory associated with the regex structure. -
checkReplacement
public static boolean checkReplacement(@Nonnull Str replacement, @Nullable Int has_references) throws AllocationError Checks whether @replacement is a valid replacement string
(see g_regex_replace()), i.e. that all escape sequences in
it are valid.
If @has_references is not %NULL then @replacement is checked
for pattern references. For instance, replacement text 'foo\n'
does not contain references and may be evaluated without information
about actual match, but '\0\1' (whole match followed by first
subpattern) requires valid #GMatchInfo object.- Parameters:
replacement
- the replacement stringhas_references
- location to store information about references in @replacement or %NULL- Returns:
- whether @replacement is a valid replacement string
- Throws:
AllocationError
-
errorQuark
public static int errorQuark()- Returns:
-
escapeNul
Escapes the nul characters in @string to "\x00". It can be used
to compile a regex with embedded nul characters.
For completeness, @length can be -1 for a nul-terminated string.
In this case the output string will be of course equal to @string.- Parameters:
string
- the string to escapelength
- the length of @string- Returns:
- a newly-allocated escaped string
-
escapeString
Escapes the special characters used for regular expressions
in @string, for instance "a.b*c" becomes "a\.b\*c". This
function is useful to dynamically generate regular expressions.
@string can contain nul characters that are replaced with "\0",
in this case remember to specify the correct length of @string
in @length.- Parameters:
string
- the string to escapelength
- the length of @string, in bytes, or -1 if @string is nul-terminated- Returns:
- a newly-allocated escaped string
-
matchSimple
public static boolean matchSimple(@Nonnull Str pattern, @Nonnull Str string, int compile_options, int match_options) Scans for a match in @string for @pattern.
This function is equivalent to g_regex_match() but it does not
require to compile the pattern with g_regex_new(), avoiding some
lines of code when you need just to do a match without extracting
substrings, capture counts, and so on.
If this function is to be called on the same @pattern more than
once, it's more efficient to compile the pattern once with
g_regex_new() and then use g_regex_match().- Parameters:
pattern
- the regular expressionstring
- the string to scan for matchescompile_options
- compile options for the regular expression, or 0match_options
- match options, or 0- Returns:
- %TRUE if the string matched, %FALSE otherwise
-
getTypeID
public static long getTypeID() -
getParentTypeID
public static long getParentTypeID() -
getTypeSize
-
getParentTypeSize
-
getInstanceSize
public static int getInstanceSize()
-