ansaurus

Question

How can I remove all characters from string starting at first non-alpha character?

Answer 1

+5 A:

s/([A-Za-z]*).*/$1/

... will work. It's not necessarily the best way of doing it, but it's a general case replace.

It only works if you just want alpha characters

2009-02-03 16:56:43

Answer 2

+2 A:

Maybe this:

s/(?<=^[A-Z]+).*//

Uses look-behind to replace everything after the starting alphas with blank.

Add an i flag for case-insensitive if necessary:

s/(?<=^[A-Z]+).*//i

Peter Boughton 2009-02-03 17:02:39

Answer 3

+11 A:

$s =~ s/[^a-zA-Z].*$//;

Literally, find the first non-alpha char and chop everything off starting from it.

Igor Oks 2009-02-03 17:10:58

Get rid of the dot.

Graeme Perrow 2009-02-03 17:13:01

In his example he gets rid of all chars after the first non-alpha, and not of all non-alpha in the end.

Igor Oks 2009-02-03 17:16:57

Sorry, you're right, I misunderstood the question.

Graeme Perrow 2009-02-03 17:19:18

What might be a necessary caveat: What is considered as an alpha character? Depending on your input this might be more than /[a-zA-Z]/ ...

2009-02-04 07:35:14

Trailing $ useless because .* is greedy.

Hynek -Pichi- Vychodil 2009-02-04 08:32:58

s/\P{alpha}.*// works for me fine ;-)

Hynek -Pichi- Vychodil 2009-02-04 08:36:13

Yes, it works without $ too. Thanks :)

Igor Oks 2009-02-04 08:42:34

Answer 4

+2 A:

NOTE: I think Igor's is more efficient.

$str =~ s{^([A-Z]+).*}{$1};

Add the 'i' flag for case-insensitive matches

$str =~ s{^([A-Z]+).*}{$1}i;

Joe Casadonte 2009-02-03 17:16:07

Actually I did a quick test, 1,000,000 iterations of 4 strings, my average was 15 seconds, Igor's was 3 :)

Joe Casadonte 2009-02-03 17:31:56

Answer 5

+6 A:

You phrased the request 2 ways:

Get all the alpha chars off the front of these strings
Find the last alpha char and chop everything off after

While the result is the same given your sample strings, I've found it pays to be more careful with regexes. So, I'd take the first item above as the real requirement, and write it as:

$str =~ s/^([a-z]*)[^a-z].*/$1/i;

The advantage in my mind is that unexpected strings (like "7KENP989SD") should result in a null string after substitution, instead of something unexpected like "7KENP". Of course, maybe that is what you wanted...

jimtut 2009-02-03 17:28:31

its phrased 2 ways, but the same thing.. by 'get all the alpha chars off' i meant separate them and store them into another var

CheeseConQueso 2009-02-03 17:43:14

Answer 6

+2 A:

Here's my go at it.

/^([A-Za-z]).$/

EDIT I like Igor's approach better than mine ..

code:

#!/usr/bin/perl
#
# http://stackoverflow.com/questions/507941/perl-regex-remove-all-characters-from-string-after-last-alpha-character
#
use strict;
use warnings;
for my $string (<DATA>){
    $string =~ /^([A-Za-z]*).*$/;
    print "$1\n";
}
__DATA__
KENP989SD
KENP913E
KENPX189R
KENP913

lexu 2009-02-03 17:29:55

Answer 7

+2 A:

If you don't need to modify the input line itself, I use this a little more:

my ( $alpha_prefix ) = ( $input_line =~ /^(\p{IsAlpha}*)/ );

Most of my variables are lexicals in a vast majority of cases, so a few more don't hurt and keeps me from possibly misrepresenting input. Plus, it passes taint.

Axeman 2009-02-03 21:34:30

Answer 8

+2 A:

s/\P{Alpha}.*// works for me fine:

perl -pe 's/\P{Alpha}.*//' <<EOF
KENP989SD
KENP913E
KENPX189R
KENP913
EOF

Hynek -Pichi- Vychodil 2009-02-04 08:38:54

ansaurus

tags:

views:

answers:

How can I remove all characters from string starting at first non-alpha character?

related questions