NAME
- Regexp::Wildcards - Converts wildcards expressions to Perl regular
+ Regexp::Wildcards - Converts wildcard expressions to Perl regular
expressions.
VERSION
- Version 0.02
+ Version 1.01
SYNOPSIS
- use Regexp::Wildcards qw/wc2re/;
+ use Regexp::Wildcards;
+
+ my $rw = Regexp::Wildcards->new(type => 'unix');
my $re;
- $re = wc2re 'a{b.,c}*' => 'unix'; # Do it Unix style.
- $re = wc2re 'a.,b*' => 'win32'; # Do it Windows style.
- $re = wc2re '*{x,y}.' => 'jokers'; # Process the jokers & escape the rest.
+ $re = $rw->convert('a{b?,c}*'); # Do it Unix shell style.
+ $re = $rw->convert('a?,b*', 'win32'); # Do it Windows shell style.
+ $re = $rw->convert('*{x,y}?', 'jokers'); # Process the jokers and escape the rest.
+ $re = $rw->convert('%a_c%', 'sql'); # Turn SQL wildcards into regexps.
+
+ $rw = Regexp::Wildcards->new(
+ do => [ qw/jokers brackets/ ], # Do jokers and brackets.
+ capture => [ qw/any greedy/ ], # Capture *'s greedily.
+ );
+
+ $rw->do(add => 'groups'); # Don't escape groups.
+ $rw->capture(rem => [ qw/greedy/ ]); # Actually we want non-greedy matches.
+ $re = $rw->convert('*a{,(b)?}?c*'); # '(.*?)a(?:|(b).).c(.*?)'
+ $rw->capture(); # No more captures.
DESCRIPTION
In many situations, users may want to specify patterns to match but
don't need the full power of regexps. Wildcards make one of those sets
- of simplified rules. This module converts wildcards expressions to Perl
- regular expressions, so that you can use them for matching. It handles
- the "*" and "?" jokers, as well as Unix bracketed alternatives "{,}",
- and uses the backspace ("\") as an escape character. Wrappers are
- provided to mimic the behaviour of Windows and Unix shells.
+ of simplified rules. This module converts wildcard expressions to Perl
+ regular expressions, so that you can use them for matching.
-EXPORT
- Four functions are exported only on request : "wc2re", "wc2re_unix",
- "wc2re_win32" and "wc2re_jokers".
-
-FUNCTIONS
- "wc2re_unix"
- This function takes as its only argument the wildcard string to process,
- and returns the corresponding regular expression according to standard
- Unix wildcard rules. It successively escapes all unprotected regexp
- special characters that doesn't hold any meaning for wildcards, turns
- jokers into their regexp equivalents, and changes bracketed blocks into
- "(?:|)" alternations. If brackets are unbalanced, it will try to
- substitute as many of them as possible, and then escape the remaining
- "{" and "}". Commas outside of any bracket-delimited block will also be
- escaped.
-
- # This is a valid brackets expression which is correctly handled.
- print 'ok' if wc2re_unix('{a{b,c}d,e}') eq '(?:a(?:b|c)d|e)';
-
- Unbalanced bracket expressions can always be rescued, but it may change
- completely its meaning. For example :
-
- # The first comma is replaced, and the remaining brackets and comma are
- # escaped.
- print 'ok' if wc2re_unix('{a\\{b,c}d,e}') eq '(?:a\\{b|c)d\\,e\\}';
-
- # All the brackets and commas are escaped.
- print 'ok' if wc2re_unix('{a{b,c\\}d,e}') eq '\\{a\\{b\\,c\\}d\\,e\\}';
-
- "wc2re_win32"
- Similar to the precedent, but for Windows wildcards. Bracketed blocks
- are no longer handled (which means that brackets will be escaped), but
- you can provide a comma-separated list of items.
-
- # All the brackets are escaped, and commas are seen as list delimiters.
- print 'ok' if wc2re_win32('{a{b,c}d,e}') eq '(?:\\{a\\{b|c\\}d|e\\})';
-
- "wc2re_jokers"
- This one only handles the "?" and "*" jokers. All other unquoted regexp
- metacharacters will be escaped.
-
- # Everything is escaped.
- print 'ok' if wc2re_jokers('{a{b,c}d,e}') eq '\\{a\\{b\\,c\\}d\\,e\\}';
-
- "wc2re"
- A generic function that wraps around all the different rules. The first
- argument is the wildcard expression, and the second one is the type of
- rules to apply, currently either "unix", "win32" or "jokers". If the
- type is undefined, it defaults to "unix".
+ It handles the "*" and "?" jokers, as well as Unix bracketed
+ alternatives "{,}", but also "%" and "_" SQL wildcards. It can also keep
+ original "(...)" groups. Backspace ("\") is used as an escape character.
-DEPENDENCIES
- Text::Balanced, which is bundled with perl since version 5.7.3
+ Typesets that mimic the behaviour of Windows and Unix shells are also
+ provided.
+
+METHODS
+ "new [ do => $what | type => $type ], capture => $captures"
+ Constructs a new Regexp::Wildcard object.
+
+ "do" lists all features that should be enabled when converting wildcards
+ to regexps. Refer to "do" for details on what can be passed in $what.
+
+ The "type" specifies a predefined set of "do" features to use. See
+ "type" for details on which types are valid. The "do" option overrides
+ "type".
+
+ "capture" lists which atoms should be capturing. Refer to "capture" for
+ more details.
+
+ "do [ $what | set => $c1, add => $c2, rem => $c3 ]"
+ Specifies the list of metacharacters to convert. They are classified
+ into five classes :
+
+ * 'jokers' converts "?" to "." and "*" to ".*" ;
+
+ 'a**\\*b??\\?c' ==> 'a.*\\*b..\\?c'
+
+ * 'sql' converts "_" to "." and "%" to ".*" ;
+
+ 'a%%\\%b__\\_c' ==> 'a.*\\%b..\\_c'
+
+ * 'commas' converts all "," to "|" and puts the complete resulting
+ regular expression inside "(?: ... )" ;
+
+ 'a,b{c,d},e' ==> '(?:a|b\\{c|d\\}|e)'
+
+ * 'brackets' converts all matching "{ ... , ... }" brackets to "(?:
+ ... | ... )" alternations. If some brackets are unbalanced, it tries
+ to substitute as many of them as possible, and then escape the
+ remaining unmatched "{" and "}". Commas outside of any
+ bracket-delimited block are also escaped ;
+
+ 'a,b{c,d},e' ==> 'a\\,b(?:c|d)\\,e'
+ '{a\\{b,c}d,e}' ==> '(?:a\\{b|c)d\\,e\\}'
+ '{a{b,c\\}d,e}' ==> '\\{a\\{b\\,c\\}d\\,e\\}'
+
+ * 'groups' keeps the parenthesis "( ... )" of the original string
+ without escaping them. Currently, no check is done to ensure that
+ the parenthesis are matching.
+
+ 'a(b(c))d\\(\\)' ==> (no change)
+
+ Each $c can be any of :
+
+ * A hash reference, with wanted metacharacter group names (described
+ above) as keys and booleans as values ;
-SEE ALSO
- Some modules provide incomplete alternatives as helper functions :
+ * An array reference containing the list of wanted metacharacter
+ classes ;
- Net::FTPServer has a method for that. Only jokers are translated, and
- escaping won't preserve them.
+ * A plain scalar, when only one group is required.
- File::Find::Match::Util has a "wildcar" function that compiles a
- matcher. Only handles "*".
+ When "set" is present, the classes given as its value replace the
+ current object options. Then the "add" classes are added, and the "rem"
+ classes removed.
- Text::Buffer has the "convertWildcardToRegex" class method that handles
- jokers.
+ Passing a sole scalar $what is equivalent as passing "set => $what". No
+ argument means "set => [ ]".
+
+ $rw->do(set => 'jokers'); # Only translate jokers.
+ $rw->do('jokers'); # Same.
+ $rw->do(add => [ qw/sql commas/ ]); # Translate also SQL and commas.
+ $rw->do(rem => 'jokers'); # Specifying both 'sql' and 'jokers' is useless.
+ $rw->do(); # Translate nothing.
+
+ "type $type"
+ Notifies to convert the metacharacters that corresponds to the
+ predefined type $type. $type can be any of 'jokers', 'sql', 'commas',
+ 'brackets', 'win32' or 'unix'. An unknown or undefined value defaults to
+ 'unix', except for 'dos', 'os2', 'MSWin32' and 'cygwin' that default to
+ 'win32'. This means that you can pass $^O as the $type and get the
+ corresponding shell behaviour. Returns the object.
+
+ $rw->type('win32'); # Set type to win32.
+ $rw->type(); # Set type to unix.
+
+ "capture [ $captures | set => $c1, add => $c2, rem => $c3 ]"
+ Specifies the list of atoms to capture. This method works like "do",
+ except that the classes are different :
+
+ * 'single' will capture all unescaped *"exactly one"* metacharacters,
+ i.e. "?" for wildcards or "_" for SQL ;
+
+ 'a???b\\??' ==> 'a(.)(.)(.)b\\?(.)'
+ 'a___b\\__' ==> 'a(.)(.)(.)b\\_(.)'
+
+ * 'any' will capture all unescaped *"any"* metacharacters, i.e. "*"
+ for wildcards or "%" for SQL ;
+
+ 'a***b\\**' ==> 'a(.*)b\\*(.*)'
+ 'a%%%b\\%%' ==> 'a(.*)b\\%(.*)'
+
+ * 'greedy', when used in conjunction with 'any', will make the 'any'
+ captures greedy (by default they are not) ;
+
+ 'a***b\\**' ==> 'a(.*?)b\\*(.*?)'
+ 'a%%%b\\%%' ==> 'a(.*?)b\\%(.*?)'
+
+ * 'brackets' will capture matching "{ ... , ... }" alternations.
+
+ 'a{b\\},\\{c}' ==> 'a(b\\}|\\{c)'
+
+ $rw->capture(set => 'single'); # Only capture "exactly one" metacharacters.
+ $rw->capture('single'); # Same.
+ $rw->capture(add => [ qw/any greedy/ ]); # Also greedily capture "any" metacharacters.
+ $rw->capture(rem => 'greedy'); # No more greed please.
+ $rw->capture(); # Capture nothing.
+
+ "convert $wc [ , $type ]"
+ Converts the wildcard expression $wc into a regular expression according
+ to the options stored into the Regexp::Wildcards object, or to $type if
+ it's supplied. It successively escapes all unprotected regexp special
+ characters that doesn't hold any meaning for wildcards, then replace
+ 'jokers' or 'sql' and 'commas' or 'brackets' (depending on the "do" or
+ "type" options), all of this by applying the 'capture' rules specified
+ in the constructor or by "capture".
+
+EXPORT
+ An object module shouldn't export any function, and so does this one.
+
+DEPENDENCIES
+ Carp (core module since perl 5), Text::Balanced (since 5.7.3).
+
+CAVEATS
+ This module does not implement the strange behaviours of Windows shell
+ that result from the special handling of the three last characters (for
+ the file extension). For example, Windows XP shell matches *a like
+ ".*a", "*a?" like ".*a.?", "*a??" like ".*a.{0,2}" and so on.
AUTHOR
- Vincent Pit, "<perl at profvince.com>"
+ Vincent Pit, "<perl at profvince.com>", <http://www.profvince.com>.
+
+ You can contact me by mail or on #perl @ FreeNode (vincent or
+ Prof_Vince).
BUGS
Please report any bugs or feature requests to "bug-regexp-wildcards at
perldoc Regexp::Wildcards
+ Tests code coverage report is available at
+ <http://www.profvince.com/perl/cover/Regexp-Wildcards>.
+
COPYRIGHT & LICENSE
- Copyright 2007 Vincent Pit, all rights reserved.
+ Copyright 2007-2008 Vincent Pit, all rights reserved.
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.