views:

2767

answers:

8

This would have been a lot easier if not for certain situations.

Sample data:

KENP989SD
KENP913E
KENPX189R
KENP913

What regular expression can I use to remove all characters from the string starting at the first non-alpha character? Basically, I want to find the first non-alpha character and chop everything off after that regardless of char type.

After regex is applied, these data should be:

KENP
KENP
KENPX
KENP
+5  A: 
s/([A-Za-z]*).*/$1/

... will work. It's not necessarily the best way of doing it, but it's a general case replace.

It only works if you just want alpha characters

+2  A: 

Maybe this:

s/(?<=^[A-Z]+).*//

Uses look-behind to replace everything after the starting alphas with blank.


Add an i flag for case-insensitive if necessary:

s/(?<=^[A-Z]+).*//i
Peter Boughton
+11  A: 

$s =~ s/[^a-zA-Z].*$//;

Literally, find the first non-alpha char and chop everything off starting from it.

Igor Oks
Get rid of the dot.
Graeme Perrow
In his example he gets rid of all chars after the first non-alpha, and not of all non-alpha in the end.
Igor Oks
Sorry, you're right, I misunderstood the question.
Graeme Perrow
What might be a necessary caveat: What is considered as an alpha character? Depending on your input this might be more than /[a-zA-Z]/ ...
Trailing $ useless because .* is greedy.
Hynek -Pichi- Vychodil
s/\P{alpha}.*// works for me fine ;-)
Hynek -Pichi- Vychodil
Yes, it works without $ too. Thanks :)
Igor Oks
+2  A: 

NOTE: I think Igor's is more efficient.

$str =~ s{^([A-Z]+).*}{$1};

Add the 'i' flag for case-insensitive matches

$str =~ s{^([A-Z]+).*}{$1}i;
Joe Casadonte
Actually I did a quick test, 1,000,000 iterations of 4 strings, my average was 15 seconds, Igor's was 3 :)
Joe Casadonte
+6  A: 

You phrased the request 2 ways:

  1. Get all the alpha chars off the front of these strings
  2. Find the last alpha char and chop everything off after

While the result is the same given your sample strings, I've found it pays to be more careful with regexes. So, I'd take the first item above as the real requirement, and write it as:

$str =~ s/^([a-z]*)[^a-z].*/$1/i;

The advantage in my mind is that unexpected strings (like "7KENP989SD") should result in a null string after substitution, instead of something unexpected like "7KENP". Of course, maybe that is what you wanted...

jimtut
its phrased 2 ways, but the same thing.. by 'get all the alpha chars off' i meant separate them and store them into another var
CheeseConQueso
+2  A: 

Here's my go at it.

/^([A-Za-z]).$/


EDIT I like Igor's approach better than mine ..


code:

#!/usr/bin/perl
#
# http://stackoverflow.com/questions/507941/perl-regex-remove-all-characters-from-string-after-last-alpha-character
#
use strict;
use warnings;
for my $string (<DATA>){
    $string =~ /^([A-Za-z]*).*$/;
    print "$1\n";
}
__DATA__
KENP989SD
KENP913E
KENPX189R
KENP913
lexu
+2  A: 

If you don't need to modify the input line itself, I use this a little more:

my ( $alpha_prefix ) = ( $input_line =~ /^(\p{IsAlpha}*)/ );

Most of my variables are lexicals in a vast majority of cases, so a few more don't hurt and keeps me from possibly misrepresenting input. Plus, it passes taint.

Axeman
+2  A: 

s/\P{Alpha}.*// works for me fine:

perl -pe 's/\P{Alpha}.*//' <<EOF
KENP989SD
KENP913E
KENPX189R
KENP913
EOF
Hynek -Pichi- Vychodil