1 package Regexp::Wildcards;
6 use Text::Balanced qw/extract_bracketed/;
10 Regexp::Wildcards - Converts wildcards expressions to Perl regular expressions.
18 our $VERSION = '0.01';
22 use Regexp::Wildcards qw/wc2re/;
25 $re = wc2re 'a{b.,c}*' => 'unix'; # Do it Unix style.
26 $re = wc2re 'a.,b*' => 'win32'; # Do it Windows style.
27 $re = wc2re '*{x,y}.' => 'jokers'; # Process the jokers & escape the rest.
31 In many situations, users may want to specify patterns to match but don't need the full power of regexps. Wildcards make one of those sets of simplified rules. This module converts wildcards expressions to Perl regular expressions, so that you can use them for matching. It handles the C<*> and C<?> jokers, as well as Unix bracketed alternatives C<{,}>, and uses the backspace (C<\>) as an escape character. Wrappers are provided to mimic the behaviour of Windows and Unix shells.
35 Four functions are exported only on request : C<wc2re>, C<wc2re_unix>, C<wc2re_win32> and C<wc2re_jokers>.
39 use base qw/Exporter/;
42 'jokers' => \&wc2re_jokers,
43 'unix' => \&wc2re_unix,
44 'win32' => \&wc2re_win32
48 our @EXPORT_OK = ('wc2re', map { 'wc2re_' . $_ } keys %types);
49 our @EXPORT_FAIL = qw/extract do_jokers do_commas do_brackets do_bracketed/;
50 our %EXPORT_TAGS = ( all => [ @EXPORT_OK ] );
56 This function takes as its only argument the wildcard string to process, and returns the corresponding regular expression (or C<undef> if the source is invalid) according to standard Unix wildcard rules. It successively escapes all unprotected regexp special characters that doesn't hold any meaning for wildcards, turns jokers into their regexp equivalents, and changes bracketed blocks into C<(?:|)> alternations. If brackets are unbalanced, it will try to substitute as many of them as possible, and then escape the remaining C<{> and C<}>.
58 Unbalanced bracket expressions can always be rescued, but it may change completely its meaning. As a side effect, commas that first appear to be between brackets can be taken at the uppermost level, which invalidates the pattern. For example :
60 # The last orphaned } gets escaped, and the first comma is replaced.
61 # We also need to escape the comma because unix doesn't allow them out
63 print 'ok' if wc2re_unix('{a\\{b,c}d\\,e}') eq '(?:a\\{b|c)d\\,e\\}';
65 # All of the unprotected brackets are escaped, which means that we must
66 # escape all the commas.
67 print 'ok' if wc2re_unix('{a{b\\,c\\}d\\,e}') eq '\\{a\\{b\\,c\\}d\\,e\\}';
73 return unless defined $re;
74 $re =~ s/(?<!\\)((?:\\\\)*[^\w\s?*\\\{\},])/\\$1/g;
75 return do_bracketed(do_jokers($re));
80 Similar to the precedent, but for Windows wildcards. Bracketed blocks are no longer handled (which means that brackets will be escaped), but you can still provide a comma-separated list of items.
86 return unless defined $wc;
87 $wc =~ s/(?<!\\)((?:\\\\)*[^\w\s?*\\,])/\\$1/g;
88 my $re = do_jokers($wc);
89 if ($re =~ /(?<!\\)(?:\\\\)*,/) { # win32 allows comma-separated lists
90 $re = '(?:' . do_commas($re) . ')';
95 =head2 C<wc2re_jokers>
97 This one only handles the C<?> and C<*> jokers. All other unquoted regexp metacharacters will be quoted.
103 $wc =~ s/(?<!\\)((?:\\\\)*[^\w\s?*\\])/\\$1/g;
104 return do_jokers($wc);
109 A generic function that wraps around all the different rules. The first argument is the wildcard expression, and the second one is the type of rules to apply, currently either C<unix>, C<win32> or C<jokers>. If the type is undefined, it defaults to C<unix>.
114 my ($wc, $type) = @_;
115 return unless defined $wc;
117 return $types{lc $type}($wc);
122 L<Text::Balanced>, which is bundled with perl since version 5.7.3
126 Some modules provide incomplete alternatives as helper functions :
128 L<Net::FTPServer> has a method for that. Only jokers are translated, and escaping won't preserve them.
130 L<File::Find::Match::Util> has a C<wildcar> function that compiles a matcher. Only handles C<*>.
132 L<Text::Buffer> has the C<convertWildcardToRegex> class method that handles jokers.
136 Vincent Pit, C<< <perl at profvince.com> >>
140 Please report any bugs or feature requests to
141 C<bug-regexp-wildcards at rt.cpan.org>, or through the web interface at
142 L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Regexp-Wildcards>.
143 I will be notified, and then you'll automatically be notified of progress on
144 your bug as I make changes.
148 You can find documentation for this module with the perldoc command.
150 perldoc Regexp::Wildcards
152 =head1 COPYRIGHT & LICENSE
154 Copyright 2007 Vincent Pit, all rights reserved.
156 This program is free software; you can redistribute it and/or modify it
157 under the same terms as Perl itself.
161 sub extract { extract_bracketed shift, '{', qr/.*?(?:(?<!\\)(?:\\\\)*)(?={)/; }
165 # escape an odd number of \ that doesn't protect a regexp/wildcard special char
166 s/(?<!\\)((?:\\\\)*\\(?:[\w\s]|$))/\\$1/g;
167 # substitute ? preceded by an even number of \
168 s/(?<!\\)((?:\\\\)*)\?/$1./g;
169 # substitute * preceded by an even number of \
170 s/(?<!\\)((?:\\\\)*)\*+/$1.*/g;
176 # substitute , preceded by an even number of \
177 s/(?<!\\)((?:\\\\)*),/$1|/g;
183 substr $rest, 0, 1, '';
185 my ($re, $bracket, $prefix) = ('');
186 while (($bracket, $rest, $prefix) = extract $rest and $bracket) {
187 $re .= do_commas($prefix) . do_brackets($bracket);
189 $re .= do_commas($rest);
190 return '(?:' . $re . ')';
195 my ($re, $bracket, $prefix) = ('');
196 while (($bracket, $rest, $prefix) = extract $rest and $bracket) {
197 return undef if $prefix =~ /(?<!\\)((?:\\\\)*),/;
198 $re .= $prefix . do_brackets($bracket);
200 return undef if $rest =~ /(?<!\\)((?:\\\\)*),/;
202 $re =~ s/(?<!\\)((?:\\\\)*[\{\}])/\\$1/g;
206 1; # End of Regexp::Wildcards