views:

122

answers:

6

Hello,

Using re in Python, I would like to return all of the characters in a string that precede the first appearance of an underscore. In addition, I would like the string that is being returned to be in all uppercase and without any non-alpanumeric characters.

For example:

AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2   = TLAV1

I am pretty sure I know how to return a string in all uppercase using string.upper() but I'm sure there are several ways to remove the . efficiently. Any help would be greatly appreciated. I am still learning regular expressions slowly but surely. Each tip gets added to my notes for future use.

To further clarify, my above examples aren't the actual strings. The actual string would look like:

AG.av08_binloop_v6

With my desired output looking like:

AGAV08

And the next example would be the same. String:

TL.av1_binloopv2

Desired output:

TLAV1

Again, thanks all for the help!

+6  A: 

Even without re:

text.split('_', 1)[0].replace('.', '').upper()
eumiro
A: 

You don't have to use re for this. Simple string operations would be enough based on your requirements:

tests = """
AG.av08_binloop_v6 = AGAV08
TL.av1_binloopv2   = TLAV1
"""

for t in tests.splitlines(): 
     print t[:t.find('_')].replace('.', '').upper()

# Returns:
# AGAV08
# TLAV1

Or if you absolutely must use re:

import re 

pat = r'([a-zA-Z0-9.]+)_.*'
pat_re = re.compile(pat)

for t in tests.splitlines():
    print re.sub(r'\.', '', pat_re.findall(t)[0]).upper()

# Returns:
# AGAV08
# TLAV1
jathanism
While I agree that regexes don't cure cancer and are generally overused, they *are* a viable choice for tasks like this one.
delnan
Viable, yes. But overcomplicated for such a simple task.
jathanism
Gumbo's solution is a rather readable oneliner. If one knows regex basics, it's perfectly clear what it does. It's not like that's a 6k character monster.
delnan
+5  A: 

Try this:

re.sub("[^A-Z\d]", "", re.search("^[^_]*", str).group(0).upper())
Gumbo
+1 "on spec", looks right but I'm not a regex guru. Edit: I'd replace `re.search` with `re.match` and drop the initial `^` in the pattern (`match` always starts at the start of the string and is optimized for this).
delnan
While I like the simplicity of the other answers, I also wanted whatever solution I went with to be useful for further regular expression exploration. This one fits the bill. Thanks Gumbo!
durandal
Just for future knowledge, how would this be done with ``re.compile``? I like the one-liner nature of this, but it would be good to know for future knowledge.
durandal
@durandal: `a = re.compile(r'^A-Z\d')`, `b = re.compile(r'[^_]*')` and `re.sub(a, "", re.search(b, s).group(0).upper())` (substitute descriptive names for a and b). The raw strings (`r"..."`) are not needed here, but I prefer to always use them for regex patterns.
delnan
Actually creating regular expression objects using `re.compile` makes it clearer that what you wrote, delnan (you just substitued the string expression with the compiled one). You can actually call `a.sub( "", b.search( s ).group(0).upper() )` instead.
poke
A: 

import re

re.sub("[^A-Z\d]", "", yourstr.split('_',1)[0].upper())

Daniel Lenkes
A: 

Since everyone is giving their favorite implementation, here's mine that doesn't use re:

>>> for s in ('AG.av08_binloop_v6', 'TL.av1_binloopv2'):
...     print ''.join(c for c in s.split('_',1)[0] if c.isalnum()).upper()
...
AGAV08
TLAV1

I put .upper() on the outside of the generator so it is only called once.

Steven Rumbalski
A: 

He, just for fun, another option to get text before the first underscore is:

before_underscore, sep, after_underscore = str.partition('_')

So all in one line could be:

re.sub("[^A-Z\d]", "", str.partition('_')[0].upper())
Etienne