X-Git-Url: http://git.vpit.fr/?p=perl%2Fmodules%2FRegexp-Wildcards.git;a=blobdiff_plain;f=README;h=93a7721460026ce987d0868fc39354b043120fc2;hp=7469605447d8bc9b34cecbf28e4ce853480d9b1b;hb=a1d84fdc64b3007b8dd1560c5b6b123554a06ea5;hpb=d3841a7816c3e170f292ced4a5818ab252574300 diff --git a/README b/README index 7469605..93a7721 100644 --- a/README +++ b/README @@ -1,63 +1,187 @@ NAME - Regexp::Wildcards - Converts wildcards to regexps. + Regexp::Wildcards - Converts wildcard expressions to Perl regular + expressions. VERSION - Version 0.01 + Version 0.08 SYNOPSIS use Regexp::Wildcards qw/wc2re/; my $re; - $re = wc2re 'a{b.,c}*' => 'unix'; - $re = wc2re 'a.,b*' => 'win32'; + $re = wc2re 'a{b?,c}*' => 'unix'; # Do it Unix style. + $re = wc2re 'a?,b*' => 'win32'; # Do it Windows style. + $re = wc2re '*{x,y}?' => 'jokers'; # Process the jokers & escape the rest. + $re = wc2re '%a_c%' => 'sql'; # Turn SQL wildcards into regexps. DESCRIPTION - This module converts wildcards expressions to Perl regular expressions. - It handles the "*" and "?" jokers, as well as Unix bracketed - alternatives "{,}", and uses the backspace ("\") as an escape character. - Wrappers are provided to mimic the behaviour of Windows and Unix shells. - -EXPORT - Four functions are exported only on request : "wc2re", "wc2re_unix", - "wc2re_win32" and "wc2re_jokers". + In many situations, users may want to specify patterns to match but + don't need the full power of regexps. Wildcards make one of those sets + of simplified rules. This module converts wildcard expressions to Perl + regular expressions, so that you can use them for matching. It handles + the "*" and "?" shell jokers, as well as Unix bracketed alternatives + "{,}", but also "%" and "_" SQL wildcards. Backspace ("\") is used as an + escape character. Wrappers are provided to mimic the behaviour of + Windows and Unix shells. + +VARIABLES + These variables control if the wildcards jokers and brackets must + capture their match. They can be globally set by writing in your program + + $Regexp::Wildcards::CaptureSingle = 1; + # From then, "exactly one" wildcards are capturing + + or can be locally specified via "local" + + { + local $Regexp::Wildcards::CaptureSingle = 1; + # In this block, "exactly one" wildcards are capturing. + ... + } + # Back to the situation from before the block + + This section describes also how those elements are translated by the + functions. + + $CaptureSingle + When this variable is true, each occurence of unescaped *"exactly one"* + wildcards (i.e. "?" jokers or "_" for SQL wildcards) are made capturing + in the resulting regexp (they are be replaced by "(.)"). Otherwise, they + are just replaced by ".". Default is the latter. + + For jokers : + 'a???b\\??' is translated to 'a(.)(.)(.)b\\?(.)' if $CaptureSingle is true + 'a...b\\?.' otherwise (default) + + For SQL wildcards : + 'a___b\\__' is translated to 'a(.)(.)(.)b\\_(.)' if $CaptureSingle is true + 'a...b\\_.' otherwise (default) + + $CaptureAny + By default this variable is false, and successions of unescaped *"any"* + wildcards (i.e. "*" jokers or "%" for SQL wildcards) are replaced by one + single ".*". When it evalutes to true, those sequences of *"any"* + wildcards are made into one capture, which is greedy ("(.*)") for + "$CaptureAny > 0" and otherwise non-greedy ("(.*?)"). + + For jokers : + 'a***b\\**' is translated to 'a.*b\\*.*' if $CaptureAny is false (default) + 'a(.*)b\\*(.*)' if $CaptureAny > 0 + 'a(.*?)b\\*(.*?)' otherwise + + For SQL wildcards : + 'a%%%b\\%%' is translated to 'a.*b\\%.*' if $CaptureAny is false (default) + 'a(.*)b\\%(.*)' if $CaptureAny > 0 + 'a(.*?)b\\%(.*?)' otherwise + + $CaptureBrackets + If this variable is set to true, valid brackets constructs are made into + "( | )" captures, and otherwise they are replaced by non-capturing + alternations ("(?: | ")), which is the default. + + 'a{b\\},\\{c}' is translated to 'a(b\\}|\\{c)' if $CaptureBrackets is true + 'a(?:b\\}|\\{c)' otherwise (default) FUNCTIONS - "wc2re_unix" + "wc2re_jokers" This function takes as its only argument the wildcard string to process, - and returns the corresponding regular expression (or "undef" if the - source is invalid) according to standard Unix wildcard rules. It - successively escapes all regexp special characters that doesn't hold any - meaning for wildcards, turns jokers into their regexp equivalents, and - changes bracketed blocks into alternations. If brackets are unbalanced, - it will try to substitute as many of them as possible, and then escape - the remaining "{" and "}". + and returns the corresponding regular expression where the jokers "?" + (*"exactly one"*) and "*" (*"any"*) have been translated into their + regexp equivalents (see "VARIABLES" for more details). All other + unprotected regexp metacharacters are escaped. + + # Everything is escaped. + print 'ok' if wc2re_jokers('{a{b,c}d,e}') eq '\\{a\\{b\\,c\\}d\\,e\\}'; + + "wc2re_sql" + Similar to the precedent, but for the SQL wildcards "_" (*"exactly + one"*) and "%" (*"any"*). All other unprotected regexp metacharacters + are escaped. + + "wc2re_unix" + This function conforms to standard Unix shell wildcard rules. It + successively escapes all unprotected regexp special characters that + doesn't hold any meaning for wildcards, turns "?" and "*" jokers into + their regexp equivalents (see "wc2re_jokers"), and changes bracketed + blocks into (possibly capturing) alternations as described in + "VARIABLES". If brackets are unbalanced, it tries to substitute as many + of them as possible, and then escape the remaining "{" and "}". Commas + outside of any bracket-delimited block are also escaped. + + # This is a valid bracket expression, and is completely translated. + print 'ok' if wc2re_unix('{a{b,c}d,e}') eq '(?:a(?:b|c)d|e)'; + + The function handles unbalanced bracket expressions, by escaping + everything it can't recognize. For example : + + # The first comma is replaced, and the remaining brackets and comma are escaped. + print 'ok' if wc2re_unix('{a\\{b,c}d,e}') eq '(?:a\\{b|c)d\\,e\\}'; + + # All the brackets and commas are escaped. + print 'ok' if wc2re_unix('{a{b,c\\}d,e}') eq '\\{a\\{b\\,c\\}d\\,e\\}'; "wc2re_win32" - Similar to the precedent, but for Windows wildcards. Bracketed blocks - are no longer handled (which means that brackets will be escaped), but - you can still provide a comma-separated list of items. + This one works just like the one before, but for Windows wildcards. + Bracketed blocks are no longer handled (which means that brackets are + escaped), but you can provide a comma-separated list of items. - "wc2re_jokers" - This one only handles the "?" and "*" jokers. + # All the brackets are escaped, and commas are seen as list delimiters. + print 'ok' if wc2re_win32('{a{b,c}d,e}') eq '(?:\\{a\\{b|c\\}d|e\\})'; "wc2re" A generic function that wraps around all the different rules. The first argument is the wildcard expression, and the second one is the type of - rules to apply, currently either "unix", "win32" or "jokers". If the - type is undefined, it defaults to "unix". + rules to apply which can be : + + 'unix', 'win32', 'jokers', 'sql' + For one of those raw rule names, "wc2re" simply maps to + "wc2re_unix", "wc2re_win32", "wc2re_jokers" and "wc2re_sql" + respectively. + + $^O If you supply the Perl operating system name, the call is deferred + to "wc2re_win32" for $^O equal to 'dos', 'os2', 'MSWin32' or + 'cygwin', and to "wc2re_unix" in all the other cases. + + If the type is undefined or not supported, it defaults to 'unix'. + + # Wraps to wc2re_jokers ($re eq 'a\\{b\\,c\\}.*'). + $re = wc2re 'a{b,c}*' => 'jokers'; + + # Wraps to wc2re_win32 ($re eq '(?:a\\{b|c\\}.*)') + # or wc2re_unix ($re eq 'a(?:b|c).*') depending on $^O. + $re = wc2re 'a{b,c}*' => $^O; + +EXPORT + These five functions are exported only on request : "wc2re", + "wc2re_unix", "wc2re_win32", "wc2re_jokers" and "wc2re_sql". The + variables are not exported. + +DEPENDENCIES + Text::Balanced, which is bundled with perl since version 5.7.3 + +CAVEATS + This module does not implement the strange behaviours of Windows shell + that result from the special handling of the three last characters (for + the file extension). For example, Windows XP shell matches *a like + ".*a", "*a?" like ".*a.?", "*a??" like ".*a.{0,2}" and so on. SEE ALSO + Some modules provide incomplete alternatives as helper functions : + Net::FTPServer has a method for that. Only jokers are translated, and escaping won't preserve them. - File::Find::Match::Util has a "wildcar" function that compiles a - matcher. Only handles "*". + File::Find::Match::Util has a "wildcard" function that compiles a + matcher. It only handles "*". Text::Buffer has the "convertWildcardToRegex" class method that handles jokers. AUTHOR - Vincent Pit, "" + Vincent Pit, "", . + + You can contact me by mail or on #perl @ FreeNode (vincent or + Prof_Vince). BUGS Please report any bugs or feature requests to "bug-regexp-wildcards at @@ -72,7 +196,7 @@ SUPPORT perldoc Regexp::Wildcards COPYRIGHT & LICENSE - Copyright 2007 Vincent Pit, all rights reserved. + Copyright 2007-2008 Vincent Pit, all rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.