1 package Hash::Normalize;
12 Hash::Normalize - Automatically normalize Unicode hash keys.
27 use Hash::Normalize qw<normalize>;
29 normalize my %hash, 'NFC';
31 $hash{café} = 'coffee'; # NFD, "cafe\x{301}"
33 print $hash{café}; # NFD, "cafe\x{301}"
36 print $hash{café}; # NFC, "caf\x{e9}"
37 # 'coffee' is also printed
41 This module provides an utility routine that augments a given Perl hash table so that its keys are automatically normalized following one of the Unicode normalization schemes.
42 All the following actions on this hash will be made regardless of how the key used for the action is normalized.
44 Since this module does not use the C<tie> mechanism, normalized hashes are indistinguishable from regular hashes as far as Perl is concerned, but this module also provides L</get_normalization> to identify them if necessary.
49 use Unicode::Normalize ();
56 normalize %hash, $mode;
58 Applies the Unicode normalization scheme C<$mode> onto C<%hash>.
59 C<$mode> defaults to C<'NFC'> if omitted, and should match C</^(?:(?:nf)?k?|fc)[cd]$/i> otherwise.
61 C<normalize> will first try to forcefully normalize the existing keys in C<%hash> to the new mode, but it will throw an exception if there are distinct keys that have the same normalization.
62 All the keys subsequently used for fetches, stores, exists, deletes and list assignments are then first passed through the according normalization procedure.
63 C<keys %hash> will also return the list of normalized keys.
67 sub _remap { $_[2] = Unicode::Normalize::normalize($_[1], "$_[2]"); undef }
69 my $wiz = Variable::Magic::wizard(
70 data => sub { $_[1] },
81 $mode = 'nfc' unless defined $mode;
82 if ($mode =~ /^(?:nf)?(k?[cd])$/i) {
84 } elsif ($mode =~ /^(fc[cd])$/i) {
88 Carp::croak('Invalid normalization');
94 sub normalize (\%;$) {
95 my ($hash, $mode) = @_;
97 my $previous_mode = &get_normalization($hash);
98 my $new_mode = _validate_mode($mode);
99 return $hash if defined $previous_mode and $previous_mode eq $new_mode;
101 &Variable::Magic::dispell($hash, $wiz);
105 for my $key (keys %$hash) {
106 my $norm = Unicode::Normalize::normalize($new_mode, $key);
107 if (exists $dup{$norm}) {
109 Carp::croak('Key collision after normalization');
111 $dup{$norm} = $hash->{$key};
116 &Variable::Magic::cast($hash, $wiz, $new_mode);
121 =head2 C<get_normalization>
123 my $mode = get_normalization %hash;
124 normalize %hash, $mode;
126 Returns the current Unicode normalization scheme in use for C<%hash>, or C<undef> if it is a plain hash.
130 sub get_normalization (\%) { &Variable::Magic::getdata($_[0], $wiz) }
132 =head1 NORMALIZED SYMBOL LOOKUPS
134 Stashes (Perl symbol tables) are implemented as plain hashes, therefore one can use C<normalize %Pkg::> on them to make sure that Unicode symbol lookups are made regardless of normalization.
139 require Hash::Normalize;
140 # Enforce NFC normalization
141 Hash::Normalize::normalize(%Foo::, 'NFC')
144 sub café { # NFD, "cafe\x{301}"
149 café() # NFC, "cafe\x{e9}"
153 café() # NFD, "cafe\x{301}"
156 # Both coffee_nfc() and coffee_nfd() return 'coffee'
160 Using a normalized hash is slightly slower than a plain hash, due to the normalization procedure and the overhead of magic.
162 If a hash is initialized from a normalized hash by list assignment (C<%new = %normalized>), then the normalization scheme will not be carried over to the new hash, although its keys will initially be normalized like the ones from the original hash.
166 The functions L</normalize> and L</get_normalization> are only exported on request by specifying their names in the module import list.
173 our %EXPORT_TAGS = ();
174 our @EXPORT_OK = qw<normalize get_normalization>;
180 L<Carp>, L<Exporter> (core since perl 5).
182 L<Unicode::Normalize> (core since perl 5.8).
184 L<Variable::Magic> 0.51.
188 Vincent Pit, C<< <perl at profvince.com> >>, L<http://www.profvince.com>.
190 You can contact me by mail or on C<irc.perl.org> (vincent).
194 Please report any bugs or feature requests to C<bug-hash-normalize at rt.cpan.org>, or through the web interface at L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Hash-Normalize>.
195 I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
199 You can find documentation for this module with the perldoc command.
201 perldoc Hash::Normalize
203 =head1 COPYRIGHT & LICENSE
205 Copyright 2017 Vincent Pit, all rights reserved.
207 This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
211 1; # End of Hash::Normalize