ansaurus

Question

Returning all characters before the first underscore

Answer 1

+6 A:

Even without re:

text.split('_', 1)[0].replace('.', '').upper()

eumiro 2010-09-21 16:33:53

Answer 2

A:

You don't have to use re for this. Simple string operations would be enough based on your requirements:

tests = """
AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2   = TLAV1
"""

for t in tests.splitlines(): 
     print t[:t.find('_')].replace('.', '').upper()

# Returns:
# AGAV08
# TLAV1

Or if you absolutely must use re:

import re 

pat = r'([a-zA-Z0-9.]+)_.*'
pat_re = re.compile(pat)

for t in tests.splitlines():
    print re.sub(r'\.', '', pat_re.findall(t)[0]).upper()

# Returns:
# AGAV08
# TLAV1

jathanism 2010-09-21 16:36:59

While I agree that regexes don't cure cancer and are generally overused, they *are* a viable choice for tasks like this one.

delnan 2010-09-21 16:40:52

Viable, yes. But overcomplicated for such a simple task.

jathanism 2010-09-21 16:46:46

Gumbo's solution is a rather readable oneliner. If one knows regex basics, it's perfectly clear what it does. It's not like that's a 6k character monster.

delnan 2010-09-21 16:52:46

Answer 3

+5 A:

Try this:

re.sub("[^A-Z\d]", "", re.search("^[^_]*", str).group(0).upper())

Gumbo 2010-09-21 16:37:13

+1 "on spec", looks right but I'm not a regex guru. Edit: I'd replace `re.search` with `re.match` and drop the initial `^` in the pattern (`match` always starts at the start of the string and is optimized for this).

delnan 2010-09-21 16:42:11

While I like the simplicity of the other answers, I also wanted whatever solution I went with to be useful for further regular expression exploration. This one fits the bill. Thanks Gumbo!

durandal 2010-09-21 16:48:59

Just for future knowledge, how would this be done with ``re.compile``? I like the one-liner nature of this, but it would be good to know for future knowledge.

durandal 2010-09-21 16:58:13

@durandal: `a = re.compile(r'^A-Z\d')`, `b = re.compile(r'[^_]*')` and `re.sub(a, "", re.search(b, s).group(0).upper())` (substitute descriptive names for a and b). The raw strings (`r"..."`) are not needed here, but I prefer to always use them for regex patterns.

delnan 2010-09-21 17:05:26

Actually creating regular expression objects using `re.compile` makes it clearer that what you wrote, delnan (you just substitued the string expression with the compiled one). You can actually call `a.sub( "", b.search( s ).group(0).upper() )` instead.

poke 2010-09-21 18:11:07

Answer 4

A:

import re

re.sub("[^A-Z\d]", "", yourstr.split('_',1)[0].upper())

Daniel Lenkes 2010-09-21 17:15:39

Answer 5

A:

Since everyone is giving their favorite implementation, here's mine that doesn't use re:

>>> for s in ('AG.av08_binloop_v6', 'TL.av1_binloopv2'):
...     print ''.join(c for c in s.split('_',1)[0] if c.isalnum()).upper()
...
AGAV08
TLAV1

I put .upper() on the outside of the generator so it is only called once.

Steven Rumbalski 2010-09-21 18:02:30

Answer 6

A:

He, just for fun, another option to get text before the first underscore is:

before_underscore, sep, after_underscore = str.partition('_')

So all in one line could be:

re.sub("[^A-Z\d]", "", str.partition('_')[0].upper())

Etienne 2010-09-21 18:50:33

ansaurus

tags:

views:

answers:

Returning all characters before the first underscore

related questions