How can I find extended ASCII characters in a file using Perl? Can anyone get the script?
.....thanks in advance.....
How can I find extended ASCII characters in a file using Perl? Can anyone get the script?
.....thanks in advance.....
Since the extended ASCII characters have value 128 and higher, you can just call ord on individual characters and handle those with a value >= 128. The following code reads from stdin and prints only the extended ASCII characters:
while (<>) {
while (/(.)/g) {
print($1) if (ord($1) >= 128);
}
}
Alternatively, unpack together with chr will also work. Example:
while (<>) {
foreach (unpack("C*", $_)) {
print(chr($_)) if ($_ >= 128);
}
}
(I'm sure some Perl guru can condense both of these to two one-liners...)
To print the line numbers instead, you can use the following (this does not remove duplicates, and will have odd behaviour when unicode is passed):
while (<>) {
while (/(.)/g) {
print($. . "\n") if (ord($1) >= 128);
}
}
(Thanks Yaakov Belch for the $.
tip.)
The first printable ASCII character is space
(32). The last printable ASCII character is ~
(126). So I'd probably use
while (<>) {
print "$.\n" if /[^ -~]/;
}
although it will, admittedly, also display lines containing control characters as well as extended ASCII.
Edit: Changed to print the line number rather than the line itself.
Oneliner:
perl -nE'say$.if/[\xE0-\xFF]/'
for older perl versions
perl -lne'print$.if/[\xE0-\xFF]/'
A crucial question is whether the
use bytes;
pragma should be in effect. The poster should decide that. For picking characters with codes greater than 127, the following will suffice:
print grep 127 < ord, split // while <>;
or
print grep /[^[:ascii:]]/, split // while <>;