2 Regexp::Wildcards - Converts wildcard expressions to Perl regular
11 my $rw = Regexp::Wildcards->new(type => 'unix');
14 $re = $rw->convert('a{b?,c}*'); # Do it Unix shell style.
15 $re = $rw->convert('a?,b*', 'win32'); # Do it Windows shell style.
16 $re = $rw->convert('*{x,y}?', 'jokers'); # Process the jokers and
18 $re = $rw->convert('%a_c%', 'sql'); # Turn SQL wildcards into
21 $rw = Regexp::Wildcards->new(
22 do => [ qw<jokers brackets> ], # Do jokers and brackets.
23 capture => [ qw<any greedy> ], # Capture *'s greedily.
26 $rw->do(add => 'groups'); # Don't escape groups.
27 $rw->capture(rem => [ qw<greedy> ]); # Actually we want non-greedy
29 $re = $rw->convert('*a{,(b)?}?c*'); # '(.*?)a(?:|(b).).c(.*?)'
30 $rw->capture(); # No more captures.
33 In many situations, users may want to specify patterns to match but
34 don't need the full power of regexps. Wildcards make one of those sets
35 of simplified rules. This module converts wildcard expressions to Perl
36 regular expressions, so that you can use them for matching.
38 It handles the "*" and "?" jokers, as well as Unix bracketed
39 alternatives "{,}", but also "%" and "_" SQL wildcards. If required, it
40 can also keep original "(...)" groups or "^" and "$" anchors. Backspace
41 ("\") is used as an escape character.
43 Typesets that mimic the behaviour of Windows and Unix shells are also
48 my $rw = Regexp::Wildcards->new(do => $what, capture => $capture);
49 my $rw = Regexp::Wildcards->new(type => $type, capture => $capture);
51 Constructs a new Regexp::Wildcard object.
53 "do" lists all features that should be enabled when converting wildcards
54 to regexps. Refer to "do" for details on what can be passed in $what.
56 The "type" specifies a predefined set of "do" features to use. See
57 "type" for details on which types are valid. The "do" option overrides
60 "capture" lists which atoms should be capturing. Refer to "capture" for
69 Specifies the list of metacharacters to convert or to prevent for
70 escaping. They fit into six classes :
74 Converts "?" to "." and "*" to ".*".
76 'a**\\*b??\\?c' ==> 'a.*\\*b..\\?c'
80 Converts "_" to "." and "%" to ".*".
82 'a%%\\%b__\\_c' ==> 'a.*\\%b..\\_c'
86 Converts all "," to "|" and puts the complete resulting regular
87 expression inside "(?: ... )".
89 'a,b{c,d},e' ==> '(?:a|b\\{c|d\\}|e)'
93 Converts all matching "{ ... , ... }" brackets to "(?: ... | ... )"
94 alternations. If some brackets are unbalanced, it tries to
95 substitute as many of them as possible, and then escape the
96 remaining unmatched "{" and "}". Commas outside of any
97 bracket-delimited block are also escaped.
99 'a,b{c,d},e' ==> 'a\\,b(?:c|d)\\,e'
100 '{a\\{b,c}d,e}' ==> '(?:a\\{b|c)d\\,e\\}'
101 '{a{b,c\\}d,e}' ==> '\\{a\\{b\\,c\\}d\\,e\\}'
105 Keeps the parenthesis "( ... )" of the original string without
106 escaping them. Currently, no check is done to ensure that the
107 parenthesis are matching.
109 'a(b(c))d\\(\\)' ==> (no change)
113 Prevents the *beginning-of-line* "^" and *end-of-line* "$" anchors
114 to be escaped. Since "[...]" character class are currently escaped,
115 a "^" will always be interpreted as *beginning-of-line*.
117 'a^b$c' ==> (no change)
119 Each $c can be any of :
121 * A hash reference, with wanted metacharacter group names (described
122 above) as keys and booleans as values ;
124 * An array reference containing the list of wanted metacharacter
127 * A plain scalar, when only one group is required.
129 When "set" is present, the classes given as its value replace the
130 current object options. Then the "add" classes are added, and the "rem"
133 Passing a sole scalar $what is equivalent as passing "set => $what". No
134 argument means "set => [ ]".
136 $rw->do(set => 'jokers'); # Only translate jokers.
137 $rw->do('jokers'); # Same.
138 $rw->do(add => [ qw<sql commas> ]); # Translate also SQL and commas.
139 $rw->do(rem => 'jokers'); # Specifying both 'sql' and
140 # 'jokers' is useless.
141 $rw->do(); # Translate nothing.
143 The "do" method returns the Regexp::Wildcards object.
148 Notifies to convert the metacharacters that corresponds to the
149 predefined type $type. $type can be any of :
151 * 'jokers', 'sql', 'commas', 'brackets'
153 Singleton types that enable the corresponding "do" classes.
157 Covers typical Unix shell globbing features (effectively 'jokers'
160 * $^O values for common Unix systems
162 Wrap to 'unix' (see perlport for the list).
170 Covers typical Windows shell globbing features (effectively 'jokers'
173 * 'dos', 'os2', 'MSWin32', 'cygwin'
177 In particular, you can usually pass $^O as the $type and get the
178 corresponding shell behaviour.
180 $rw->type('win32'); # Set type to win32.
181 $rw->type($^O); # Set type to unix on Unices and win32 on Windows
182 $rw->type(); # Set type to unix.
184 The "type" method returns the Regexp::Wildcards object.
187 $rw->capture($captures);
188 $rw->capture(set => $c1);
189 $rw->capture(add => $c2);
190 $rw->capture(rem => $c3);
192 Specifies the list of atoms to capture. This method works like "do",
193 except that the classes are different :
197 Captures all unescaped *"exactly one"* metacharacters, i.e. "?" for
198 wildcards or "_" for SQL.
200 'a???b\\??' ==> 'a(.)(.)(.)b\\?(.)'
201 'a___b\\__' ==> 'a(.)(.)(.)b\\_(.)'
205 Captures all unescaped *"any"* metacharacters, i.e. "*" for
206 wildcards or "%" for SQL.
208 'a***b\\**' ==> 'a(.*)b\\*(.*)'
209 'a%%%b\\%%' ==> 'a(.*)b\\%(.*)'
213 When used in conjunction with 'any', it makes the 'any' captures
214 greedy (by default they are not).
216 'a***b\\**' ==> 'a(.*?)b\\*(.*?)'
217 'a%%%b\\%%' ==> 'a(.*?)b\\%(.*?)'
221 Capture matching "{ ... , ... }" alternations.
223 'a{b\\},\\{c}' ==> 'a(b\\}|\\{c)'
225 $rw->capture(set => 'single'); # Only capture "exactly one"
227 $rw->capture('single'); # Same.
228 $rw->capture(add => [ qw<any greedy> ]); # Also greedily capture
229 # "any" metacharacters.
230 $rw->capture(rem => 'greedy'); # No more greed please.
231 $rw->capture(); # Capture nothing.
233 The "capture" method returns the Regexp::Wildcards object.
236 my $rx = $rw->convert($wc);
237 my $rx = $rw->convert($wc, $type);
239 Converts the wildcard expression $wc into a regular expression according
240 to the options stored into the Regexp::Wildcards object, or to $type if
241 it's supplied. It successively escapes all unprotected regexp special
242 characters that doesn't hold any meaning for wildcards, then replace
243 'jokers', 'sql' and 'commas' or 'brackets' (depending on the "do" or
244 "type" options), all of this by applying the 'capture' rules specified
245 in the constructor or by "capture".
248 An object module shouldn't export any function, and so does this one.
251 Carp (core module since perl 5), Scalar::Util, Text::Balanced (since
255 This module does not implement the strange behaviours of Windows shell
256 that result from the special handling of the three last characters (for
257 the file extension). For example, Windows XP shell matches *a like
258 ".*a", "*a?" like ".*a.?", "*a??" like ".*a.{0,2}" and so on.
264 Vincent Pit, "<perl at profvince.com>", <http://www.profvince.com>.
266 You can contact me by mail or on "irc.perl.org" (vincent).
269 Please report any bugs or feature requests to "bug-regexp-wildcards at
270 rt.cpan.org", or through the web interface at
271 <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Regexp-Wildcards>. I
272 will be notified, and then you'll automatically be notified of progress
273 on your bug as I make changes.
276 You can find documentation for this module with the perldoc command.
278 perldoc Regexp::Wildcards
280 Tests code coverage report is available at
281 <http://www.profvince.com/perl/cover/Regexp-Wildcards>.
284 Copyright 2007,2008,2009,2013 Vincent Pit, all rights reserved.
286 This program is free software; you can redistribute it and/or modify it
287 under the same terms as Perl itself.