NAME
- Regexp::Wildcards - Converts wildcards to regexps.
+ Regexp::Wildcards - Converts wildcard expressions to Perl regular
+ expressions.
VERSION
- Version 0.01
+ Version 0.08
SYNOPSIS
use Regexp::Wildcards qw/wc2re/;
my $re;
- $re = wc2re 'a{b.,c}*' => 'unix';
- $re = wc2re 'a.,b*' => 'win32';
+ $re = wc2re 'a{b?,c}*' => 'unix'; # Do it Unix style.
+ $re = wc2re 'a?,b*' => 'win32'; # Do it Windows style.
+ $re = wc2re '*{x,y}?' => 'jokers'; # Process the jokers & escape the rest.
+ $re = wc2re '%a_c%' => 'sql'; # Turn SQL wildcards into regexps.
DESCRIPTION
- This module converts wildcards expressions to Perl regular expressions.
- It handles the "*" and "?" jokers, as well as Unix bracketed
- alternatives "{,}", and uses the backspace ("\") as an escape character.
- Wrappers are provided to mimic the behaviour of Windows and Unix shells.
-
-EXPORT
- Four functions are exported only on request : "wc2re", "wc2re_unix",
- "wc2re_win32" and "wc2re_jokers".
+ In many situations, users may want to specify patterns to match but
+ don't need the full power of regexps. Wildcards make one of those sets
+ of simplified rules. This module converts wildcard expressions to Perl
+ regular expressions, so that you can use them for matching. It handles
+ the "*" and "?" shell jokers, as well as Unix bracketed alternatives
+ "{,}", but also "%" and "_" SQL wildcards. Backspace ("\") is used as an
+ escape character. Wrappers are provided to mimic the behaviour of
+ Windows and Unix shells.
+
+VARIABLES
+ These variables control if the wildcards jokers and brackets must
+ capture their match. They can be globally set by writing in your program
+
+ $Regexp::Wildcards::CaptureSingle = 1;
+ # From then, "exactly one" wildcards are capturing
+
+ or can be locally specified via "local"
+
+ {
+ local $Regexp::Wildcards::CaptureSingle = 1;
+ # In this block, "exactly one" wildcards are capturing.
+ ...
+ }
+ # Back to the situation from before the block
+
+ This section describes also how those elements are translated by the
+ functions.
+
+ $CaptureSingle
+ When this variable is true, each occurence of unescaped *"exactly one"*
+ wildcards (i.e. "?" jokers or "_" for SQL wildcards) are made capturing
+ in the resulting regexp (they are be replaced by "(.)"). Otherwise, they
+ are just replaced by ".". Default is the latter.
+
+ For jokers :
+ 'a???b\\??' is translated to 'a(.)(.)(.)b\\?(.)' if $CaptureSingle is true
+ 'a...b\\?.' otherwise (default)
+
+ For SQL wildcards :
+ 'a___b\\__' is translated to 'a(.)(.)(.)b\\_(.)' if $CaptureSingle is true
+ 'a...b\\_.' otherwise (default)
+
+ $CaptureAny
+ By default this variable is false, and successions of unescaped *"any"*
+ wildcards (i.e. "*" jokers or "%" for SQL wildcards) are replaced by one
+ single ".*". When it evalutes to true, those sequences of *"any"*
+ wildcards are made into one capture, which is greedy ("(.*)") for
+ "$CaptureAny > 0" and otherwise non-greedy ("(.*?)").
+
+ For jokers :
+ 'a***b\\**' is translated to 'a.*b\\*.*' if $CaptureAny is false (default)
+ 'a(.*)b\\*(.*)' if $CaptureAny > 0
+ 'a(.*?)b\\*(.*?)' otherwise
+
+ For SQL wildcards :
+ 'a%%%b\\%%' is translated to 'a.*b\\%.*' if $CaptureAny is false (default)
+ 'a(.*)b\\%(.*)' if $CaptureAny > 0
+ 'a(.*?)b\\%(.*?)' otherwise
+
+ $CaptureBrackets
+ If this variable is set to true, valid brackets constructs are made into
+ "( | )" captures, and otherwise they are replaced by non-capturing
+ alternations ("(?: | ")), which is the default.
+
+ 'a{b\\},\\{c}' is translated to 'a(b\\}|\\{c)' if $CaptureBrackets is true
+ 'a(?:b\\}|\\{c)' otherwise (default)
FUNCTIONS
- "wc2re_unix"
+ "wc2re_jokers"
This function takes as its only argument the wildcard string to process,
- and returns the corresponding regular expression (or "undef" if the
- source is invalid) according to standard Unix wildcard rules. It
- successively escapes all regexp special characters that doesn't hold any
- meaning for wildcards, turns jokers into their regexp equivalents, and
- changes bracketed blocks into alternations. If brackets are unbalanced,
- it will try to substitute as many of them as possible, and then escape
- the remaining "{" and "}".
+ and returns the corresponding regular expression where the jokers "?"
+ (*"exactly one"*) and "*" (*"any"*) have been translated into their
+ regexp equivalents (see "VARIABLES" for more details). All other
+ unprotected regexp metacharacters are escaped.
+
+ # Everything is escaped.
+ print 'ok' if wc2re_jokers('{a{b,c}d,e}') eq '\\{a\\{b\\,c\\}d\\,e\\}';
+
+ "wc2re_sql"
+ Similar to the precedent, but for the SQL wildcards "_" (*"exactly
+ one"*) and "%" (*"any"*). All other unprotected regexp metacharacters
+ are escaped.
+
+ "wc2re_unix"
+ This function conforms to standard Unix shell wildcard rules. It
+ successively escapes all unprotected regexp special characters that
+ doesn't hold any meaning for wildcards, turns "?" and "*" jokers into
+ their regexp equivalents (see "wc2re_jokers"), and changes bracketed
+ blocks into (possibly capturing) alternations as described in
+ "VARIABLES". If brackets are unbalanced, it tries to substitute as many
+ of them as possible, and then escape the remaining "{" and "}". Commas
+ outside of any bracket-delimited block are also escaped.
+
+ # This is a valid bracket expression, and is completely translated.
+ print 'ok' if wc2re_unix('{a{b,c}d,e}') eq '(?:a(?:b|c)d|e)';
+
+ The function handles unbalanced bracket expressions, by escaping
+ everything it can't recognize. For example :
+
+ # The first comma is replaced, and the remaining brackets and comma are escaped.
+ print 'ok' if wc2re_unix('{a\\{b,c}d,e}') eq '(?:a\\{b|c)d\\,e\\}';
+
+ # All the brackets and commas are escaped.
+ print 'ok' if wc2re_unix('{a{b,c\\}d,e}') eq '\\{a\\{b\\,c\\}d\\,e\\}';
"wc2re_win32"
- Similar to the precedent, but for Windows wildcards. Bracketed blocks
- are no longer handled (which means that brackets will be escaped), but
- you can still provide a comma-separated list of items.
+ This one works just like the one before, but for Windows wildcards.
+ Bracketed blocks are no longer handled (which means that brackets are
+ escaped), but you can provide a comma-separated list of items.
- "wc2re_jokers"
- This one only handles the "?" and "*" jokers.
+ # All the brackets are escaped, and commas are seen as list delimiters.
+ print 'ok' if wc2re_win32('{a{b,c}d,e}') eq '(?:\\{a\\{b|c\\}d|e\\})';
"wc2re"
A generic function that wraps around all the different rules. The first
argument is the wildcard expression, and the second one is the type of
- rules to apply, currently either "unix", "win32" or "jokers". If the
- type is undefined, it defaults to "unix".
+ rules to apply which can be :
+
+ 'unix', 'win32', 'jokers', 'sql'
+ For one of those raw rule names, "wc2re" simply maps to
+ "wc2re_unix", "wc2re_win32", "wc2re_jokers" and "wc2re_sql"
+ respectively.
+
+ $^O If you supply the Perl operating system name, the call is deferred
+ to "wc2re_win32" for $^O equal to 'dos', 'os2', 'MSWin32' or
+ 'cygwin', and to "wc2re_unix" in all the other cases.
+
+ If the type is undefined or not supported, it defaults to 'unix'.
+
+ # Wraps to wc2re_jokers ($re eq 'a\\{b\\,c\\}.*').
+ $re = wc2re 'a{b,c}*' => 'jokers';
+
+ # Wraps to wc2re_win32 ($re eq '(?:a\\{b|c\\}.*)')
+ # or wc2re_unix ($re eq 'a(?:b|c).*') depending on $^O.
+ $re = wc2re 'a{b,c}*' => $^O;
+
+EXPORT
+ These five functions are exported only on request : "wc2re",
+ "wc2re_unix", "wc2re_win32", "wc2re_jokers" and "wc2re_sql". The
+ variables are not exported.
+
+DEPENDENCIES
+ Text::Balanced, which is bundled with perl since version 5.7.3
+
+CAVEATS
+ This module does not implement the strange behaviours of Windows shell
+ that result from the special handling of the three last characters (for
+ the file extension). For example, Windows XP shell matches *a like
+ ".*a", "*a?" like ".*a.?", "*a??" like ".*a.{0,2}" and so on.
SEE ALSO
+ Some modules provide incomplete alternatives as helper functions :
+
Net::FTPServer has a method for that. Only jokers are translated, and
escaping won't preserve them.
- File::Find::Match::Util has a "wildcar" function that compiles a
- matcher. Only handles "*".
+ File::Find::Match::Util has a "wildcard" function that compiles a
+ matcher. It only handles "*".
Text::Buffer has the "convertWildcardToRegex" class method that handles
jokers.
AUTHOR
- Vincent Pit, "<perl at profvince.com>"
+ Vincent Pit, "<perl at profvince.com>", <http://www.profvince.com>.
+
+ You can contact me by mail or on #perl @ FreeNode (vincent or
+ Prof_Vince).
BUGS
Please report any bugs or feature requests to "bug-regexp-wildcards at
perldoc Regexp::Wildcards
COPYRIGHT & LICENSE
- Copyright 2007 Vincent Pit, all rights reserved.
+ Copyright 2007-2008 Vincent Pit, all rights reserved.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.