I need to extract certain Abbreviations from a file such as ABS,TVS,and PERL. Any abbreviations which are in uppercase letters. I'd preferably like to do this with a regular expression. Any help is appreciated.
+2
A:
Untested:
my %abbr;
open (my $input, "<", "filename")
|| die "open: $!";
for ( < $input > ) {
while (s/([A-Z][A-Z]+)//) {
$abbr{$1}++;
}
}
Modified it to look for at least two consecutive capital letters.
Marius Kjeldahl
2009-07-08 08:09:28
no need to substitute there, nor to read in the whole file before processing any (though you've got a bug: that's a glob(), not a readline(), due to the extra spaces).
ysth
2009-07-08 09:16:01
You're probably right, but the editor didn't allow it without the spaces. I suspect the "lt dollar" sequence got cut out without the spaces.
Marius Kjeldahl
2009-07-08 09:25:43
You need to tell the editor that you're in charge - or perhaps get a different editor.
Telemachus
2009-07-08 10:03:35
+4
A:
It would have been nice to hear what part you were particularly having trouble with.
my %abbr;
open my $inputfh, '<', 'filename'
or die "open error: $!\n";
while ( my $line = readline($inputfh) ) {
while ( $line =~ /\b([A-Z]{2,})\b/g ) {
$abbr{$1}++;
}
}
for my $abbr ( sort keys %abbr ) {
print "Found $abbr $abbr{$abbr} time(s)\n";
}
ysth
2009-07-08 09:18:08
+2
A:
#!/usr/bin/perl
use strict;
use warnings;
my %abbrs = ();
while(<>){
my @words = split ' ', $_;
foreach my $word(@words){
$word =~ /([A-Z]{2,})/ && $abbrs{$1}++;
}
}
# %abbrs now contains all abreviations
dsm
2009-07-08 09:25:35
Missing a `$word=~` there. For kicks, you could say: `$word =~ y/A-Z//c or $abbrs{$word}++;`.
ysth
2009-07-08 09:44:32
i need to extract only...abbreviations like ABC or BAV for example i have also like ABC123,CMV002 in my document it also extracts that... i just want to extract only ABC and CMV in this case.. can you help me?
lokesh
2009-07-09 05:53:28
Alternatively, if the numbers always come after the abbreviation, you can use /^([A-Z]+)[0-9]*$/
dsm
2009-07-09 09:15:33
i have a problem this /^([A-Z]+)[0-9]*$/ extracts even digits at starting... say for ex017_ABC_EFG....
lokesh
2009-08-05 09:06:50
+3
A:
Reading text to be searched from standard input and writing all abbreviations found to standard output, separated by spaces:
my $text;
# Slurp all text
{ local $/ = undef; $text = <>; }
# Extract all sequences of 2 or more uppercase characters
my @abbrevs = $text =~ /\b([[:upper:]]{2,})\b/g;
# Output separated by spaces
print join(" ", @abbrevs), "\n";
Note the use of the POSIX character class [:upper:], which will match all uppercase characters, not just English ones (A-Z).
Lars Haugseth
2009-07-08 10:15:16