You can avoid the intermediate steps by just using a command that do the counts and the comparison at the same time:
find . -type f -exec perl -nle 'END { print $ARGV if $h{"{"} != $h{"}"} } $h{$_}++ for /([}{])/g' {}\;
This calls the Perl program once per file, the Perl program counts the number of each type curly brace and prints the name of the file if they counts don't match.
You must be careful with the /([}{]])/
section, find
will think it needs to do the replacement on {}
if you say /([{}]])/
.
WARNING: this code will have false positives and negatives if you are trying to run it against source code. Consider the following cases:
balanced, but curlies in strings:
if ($s eq '{') {
print "I saw a {\n"
}
unbalanced, but curlies in strings:
while (1) {
print "}";
You can expand the Perl command by using B::Deparse:
perl -MO=Deparse -nle 'END { print $ARGV if $h{"{"} != $h{"}"} } $h{$_}++ for /([}{])/g'
Which results in:
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
sub END {
print $ARGV if $h{'{'} != $h{'}'};
}
;
++$h{$_} foreach (/([}{])/g);
}
We can now look at each piece of the program:
BEGIN { $/ = "\n"; $\ = "\n"; }
This is caused by the -l
option. It sets both the input and output record separators to "\n". This means anything read in will be broken into records based "\n" and any print statement will have "\n" appended to it.
LINE: while (defined($_ = <ARGV>)) {
}
This is created by the -n
option. It loops over every file passed in via the commandline (or STDIN if no files are passed) reading each line of those files. This also happens to set $ARGV
to the last file read by <ARGV>
.
chomp $_;
This removes whatever is in the $/
variable from the line that was just read ($_
), it does nothing useful here. It was caused by the -l
option.
sub END {
print $ARGV if $h{'{'} != $h{'}'};
}
This is an END block, this code will run at the end of the program. It prints $ARGV
(the name of the file last read from, see above) if the values stored in %h
associated with the keys '{'
and '}'
are equal.
++$h{$_} foreach (/([}{])/g);
This needs to be broken down further:
/
( #begin capture
[}{] #match any of the '}' or '{' characters
) #end capture
/gx
Is a regex that returns a list of '{' and '}' characters that are in the string being matched. Since no string was specified the $_
variable (which holds the line last read from the file, see above) will be matched against. That list is fed into the foreach
statement which then runs the statement it is in front of for each item (hence the name) in the list. It also sets $_
(as you can see $_
is a popular variable in Perl) to be the item from the list.
++h{$_}
This line increments the value in $h that is associated with $_
(which will be either '{' or '}', see above) by one.