The answer you accepted doesn't produce the results which you claim to want in your question. Specifically, the POSIX
character class [:alphanum:]
will not match punctuation characters meaning that 6t$ $eed5 *jh will not be matched. In order to match punctuation characters you need to add [:punct:]
to the char class. See the Regex cheat sheet.
So for example if you have the file tokens.txt which contains:
aa df rrr5 4323 54 hjy 10 gj @fgf %d fr43 6t$ $eed5 *jh
And you run this perl script:
#!/usr/bin/perl -w
use warnings;
use diagnostics;
use strict;
use Scalar::Util qw( looks_like_number );
my $str =<>;
my @temp = split(" ",$str);
my @num = grep { looks_like_number($_) } @temp;
my @char = grep /^[[:alpha:]]+$/, @temp;
my @alphanum = grep /^[[:alnum:][:punct:]]+$/, @temp;
print "Numbers: " . join(' ', @num) . "\n";
print "Alpha: " . join(' ', @char) . "\n";
print "Alphanum: " . join(' ', @alphanum) . "\n";
like this:
cat tokens.txt | ./tokenize.pl
You get the output:
Numbers: 4323 54 10
Alpha: aa df hjy gj
Alphanum: aa df rrr5 4323 54 hjy 10 gj @fgf %d fr43 6t$ $eed5 *jh
However, it seems by your question that you don't want to match all punctuation characters such as @
and %
, but instead only certain ones such as $
and *
.
If that's the case then you just change the Alphanum match to:
my @alphanum = grep /^[[:alnum:]\$\*]+$/, @temp;
Which will then give you the desired output of
Numbers: 4323 54 10
Alpha: aa df hjy gj
Alphanum: aa df rrr5 4323 54 hjy 10 gj fr43 6t$ $eed5 *jh