X-Git-Url: http://git.vpit.fr/?a=blobdiff_plain;f=README;h=78cc7d9bb2899804970eb42790848b5c1eb3aa2b;hb=d6f61a3d7918845e8ee6c9cac65af29aa3ca6cf0;hp=037d0d2df1fedc7319b5347a375e2c2db0584e3e;hpb=46111541589202352d6a6a665eb03fe24e3861a6;p=perl%2Fmodules%2FRegexp-Wildcards.git diff --git a/README b/README index 037d0d2..78cc7d9 100644 --- a/README +++ b/README @@ -1,94 +1,171 @@ NAME - Regexp::Wildcards - Converts wildcards expressions to Perl regular + Regexp::Wildcards - Converts wildcard expressions to Perl regular expressions. VERSION - Version 0.02 + Version 1.00 SYNOPSIS - use Regexp::Wildcards qw/wc2re/; + use Regexp::Wildcards; + + my $rw = Regexp::Wildcards->new(type => 'unix'); my $re; - $re = wc2re 'a{b.,c}*' => 'unix'; # Do it Unix style. - $re = wc2re 'a.,b*' => 'win32'; # Do it Windows style. - $re = wc2re '*{x,y}.' => 'jokers'; # Process the jokers & escape the rest. + $re = $rw->convert('a{b?,c}*'); # Do it Unix shell style. + $re = $rw->convert('a?,b*', 'win32'); # Do it Windows shell style. + $re = $rw->convert('*{x,y}?', 'jokers'); # Process the jokers and escape the rest. + $re = $rw->convert('%a_c%', 'sql'); # Turn SQL wildcards into regexps. + + $rw = Regexp::Wildcards->new( + do => [ qw/jokers brackets/ ], # Do jokers and brackets. + capture => [ qw/any greedy/ ], # Capture *'s greedily. + ); + + $rw->do(add => 'groups'); # Don't escape groups. + $rw->capture(rem => [ qw/greedy/ ]); # Actually we want non-greedy matches. + $re = $rw->convert('*a{,(b)?}?c*'); # '(.*?)a(?:|(b).).c(.*?)' + $rw->capture(); # No more captures. DESCRIPTION In many situations, users may want to specify patterns to match but don't need the full power of regexps. Wildcards make one of those sets - of simplified rules. This module converts wildcards expressions to Perl - regular expressions, so that you can use them for matching. It handles - the "*" and "?" jokers, as well as Unix bracketed alternatives "{,}", - and uses the backspace ("\") as an escape character. Wrappers are - provided to mimic the behaviour of Windows and Unix shells. - -EXPORT - Four functions are exported only on request : "wc2re", "wc2re_unix", - "wc2re_win32" and "wc2re_jokers". - -FUNCTIONS - "wc2re_unix" - This function takes as its only argument the wildcard string to process, - and returns the corresponding regular expression according to standard - Unix wildcard rules. It successively escapes all unprotected regexp - special characters that doesn't hold any meaning for wildcards, turns - jokers into their regexp equivalents, and changes bracketed blocks into - "(?:|)" alternations. If brackets are unbalanced, it will try to - substitute as many of them as possible, and then escape the remaining - "{" and "}". Commas outside of any bracket-delimited block will also be - escaped. + of simplified rules. This module converts wildcard expressions to Perl + regular expressions, so that you can use them for matching. - # This is a valid brackets expression which is correctly handled. - print 'ok' if wc2re_unix('{a{b,c}d,e}') eq '(?:a(?:b|c)d|e)'; + It handles the "*" and "?" jokers, as well as Unix bracketed + alternatives "{,}", but also "%" and "_" SQL wildcards. It can also keep + original "(...)" groups. Backspace ("\") is used as an escape character. - Unbalanced bracket expressions can always be rescued, but it may change - completely its meaning. For example : + Typesets that mimic the behaviour of Windows and Unix shells are also + provided. - # The first comma is replaced, and the remaining brackets and comma are - # escaped. - print 'ok' if wc2re_unix('{a\\{b,c}d,e}') eq '(?:a\\{b|c)d\\,e\\}'; +METHODS + "new [ do => $what | type => $type ], capture => $captures" + Constructs a new Regexp::Wildcard object. - # All the brackets and commas are escaped. - print 'ok' if wc2re_unix('{a{b,c\\}d,e}') eq '\\{a\\{b\\,c\\}d\\,e\\}'; + "do" lists all features that should be enabled when converting wildcards + to regexps. Refer to "do" for details on what can be passed in $what. - "wc2re_win32" - Similar to the precedent, but for Windows wildcards. Bracketed blocks - are no longer handled (which means that brackets will be escaped), but - you can provide a comma-separated list of items. + The "type" specifies a predefined set of "do" features to use. See + "type" for details on which types are valid. The "do" option overrides + "type". - # All the brackets are escaped, and commas are seen as list delimiters. - print 'ok' if wc2re_win32('{a{b,c}d,e}') eq '(?:\\{a\\{b|c\\}d|e\\})'; + "capture" lists which atoms should be capturing. Refer to "capture" for + more details. - "wc2re_jokers" - This one only handles the "?" and "*" jokers. All other unquoted regexp - metacharacters will be escaped. + "do [ $what | set => $c1, add => $c2, rem => $c3 ]" + Specifies the list of metacharacters to convert. They are classified + into five classes : - # Everything is escaped. - print 'ok' if wc2re_jokers('{a{b,c}d,e}') eq '\\{a\\{b\\,c\\}d\\,e\\}'; + 'jokers' converts "?" to "." and "*" to ".*" ; + 'a**\\*b??\\?c' ==> 'a.*\\*b..\\?c' - "wc2re" - A generic function that wraps around all the different rules. The first - argument is the wildcard expression, and the second one is the type of - rules to apply, currently either "unix", "win32" or "jokers". If the - type is undefined, it defaults to "unix". + 'sql' converts "_" to "." and "%" to ".*" ; + 'a%%\\%b__\\_c' ==> 'a.*\\%b..\\_c' -DEPENDENCIES - Text::Balanced, which is bundled with perl since version 5.7.3 + 'commas' converts all "," to "|" and puts the complete resulting regular + expression inside "(?: ... )" ; + 'a,b{c,d},e' ==> '(?:a|b\\{c|d\\}|e)' -SEE ALSO - Some modules provide incomplete alternatives as helper functions : + 'brackets' converts all matching "{ ... , ... }" brackets to "(?: ... | + ... )" alternations. If some brackets are unbalanced, it tries to + substitute as many of them as possible, and then escape the remaining + unmatched "{" and "}". Commas outside of any bracket-delimited block are + also escaped ; + 'a,b{c,d},e' ==> 'a\\,b(?:c|d)\\,e' + '{a\\{b,c}d,e}' ==> '(?:a\\{b|c)d\\,e\\}' + '{a{b,c\\}d,e}' ==> '\\{a\\{b\\,c\\}d\\,e\\}' + + 'groups' keeps the parenthesis "( ... )" of the original string without + escaping them. Currently, no check is done to ensure that the + parenthesis are matching. + 'a(b(c))d\\(\\)' ==> (no change) + + Each $c can be any of : + + A hash reference, with wanted metacharacter group names (described + above) as keys and booleans as values ; + An array reference containing the list of wanted metacharacter classes ; + A plain scalar, when only one group is required. + + When "set" is present, the classes given as its value replace the + current object options. Then the "add" classes are added, and the "rem" + classes removed. + + Passing a sole scalar $what is equivalent as passing "set => $what". No + argument means "set => [ ]". + + $rw->do(set => 'jokers'); # Only translate jokers. + $rw->do('jokers'); # Same. + $rw->do(add => [ qw/sql commas/ ]); # Translate also SQL and commas. + $rw->do(rem => 'jokers'); # Specifying both 'sql' and 'jokers' is useless. + $rw->do(); # Translate nothing. + + "type $type" + Notifies to convert the metacharacters that corresponds to the + predefined type $type. $type can be any of 'jokers', 'sql', 'commas', + 'brackets', 'win32' or 'unix'. An unknown or undefined value defaults to + 'unix', except for 'dos', 'os2', 'MSWin32' and 'cygwin' that default to + 'win32'. This means that you can pass $^O as the $type and get the + corresponding shell behaviour. Returns the object. + + $rw->type('win32'); # Set type to win32. + $rw->type(); # Set type to unix. + + "capture [ $captures | set => $c1, add => $c2, rem => $c3 ]" + Specifies the list of atoms to capture. This method works like "do", + except that the classes are different : + + 'single' will capture all unescaped *"exactly one"* metacharacters, i.e. + "?" for wildcards or "_" for SQL ; + 'a???b\\??' ==> 'a(.)(.)(.)b\\?(.)' + 'a___b\\__' ==> 'a(.)(.)(.)b\\_(.)' + + 'any' will capture all unescaped *"any"* metacharacters, i.e. "*" for + wildcards or "%" for SQL ; + 'a***b\\**' ==> 'a(.*)b\\*(.*)' + 'a%%%b\\%%' ==> 'a(.*)b\\%(.*)' + + 'greedy', when used in conjunction with 'any', will make the 'any' + captures greedy (by default they are not) ; + 'a***b\\**' ==> 'a(.*?)b\\*(.*?)' + 'a%%%b\\%%' ==> 'a(.*?)b\\%(.*?)' + + 'brackets' will capture matching "{ ... , ... }" alternations. + 'a{b\\},\\{c}' ==> 'a(b\\}|\\{c)' + + $rw->capture(set => 'single'); # Only capture "exactly one" metacharacters. + $rw->capture('single'); # Same. + $rw->capture(add => [ qw/any greedy/ ]); # Also greedily capture "any" metacharacters. + $rw->capture(rem => 'greedy'); # No more greed please. + $rw->capture(); # Capture nothing. + + "convert $wc [ , $type ]" + Converts the wildcard expression $wc into a regular expression according + to the options stored into the Regexp::Wildcards object, or to $type if + it's supplied. It successively escapes all unprotected regexp special + characters that doesn't hold any meaning for wildcards, then replace + 'jokers' or 'sql' and 'commas' or 'brackets' (depending on the "do" or + "type" options), all of this by applying the 'capture' rules specified + in the constructor or by "capture". - Net::FTPServer has a method for that. Only jokers are translated, and - escaping won't preserve them. +EXPORT + An object module shouldn't export any function, and so does this one. - File::Find::Match::Util has a "wildcar" function that compiles a - matcher. Only handles "*". +DEPENDENCIES + Carp (core module since perl 5), Text::Balanced (since 5.7.3). - Text::Buffer has the "convertWildcardToRegex" class method that handles - jokers. +CAVEATS + This module does not implement the strange behaviours of Windows shell + that result from the special handling of the three last characters (for + the file extension). For example, Windows XP shell matches *a like + ".*a", "*a?" like ".*a.?", "*a??" like ".*a.{0,2}" and so on. AUTHOR - Vincent Pit, "" + Vincent Pit, "", . + + You can contact me by mail or on #perl @ FreeNode (vincent or + Prof_Vince). BUGS Please report any bugs or feature requests to "bug-regexp-wildcards at @@ -102,8 +179,11 @@ SUPPORT perldoc Regexp::Wildcards + Tests code coverage report is available at + . + COPYRIGHT & LICENSE - Copyright 2007 Vincent Pit, all rights reserved. + Copyright 2007-2008 Vincent Pit, all rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.