ansaurus

Question

Is there a Pythonic way to make this logic more elegant?

Answer 1

+8 A:

[re.sub(r'^(Mr|Ms|Mrs)\.\s+', '', s) for s in test_csv_line]

Marcelo Cantos 2010-09-24 02:04:34

Wow. Very cool. It does, however, leave a single space before the name when it strips out the prefixes.

paracaudex 2010-09-24 02:07:28

I will never be tired of seeing the beauty of regular expressions.

unkiwii 2010-09-24 02:27:36

@paracaudex: you might have seen my first version when commenting. The current version strips all whitespace after the prefix.

Marcelo Cantos 2010-09-24 02:37:48

@ Marcelo. Got it, thanks.

paracaudex 2010-09-24 03:09:40

Answer 2

+1 A:

Assuming that prefixes is variable, perhaps as an aspect of localization, or you prefer not to use a regular expression for some other reason, you could do something like this (untested code):

def strip_title(string, prefixes):
    for prefix in prefixes:
         if string.startswith(prefix + ' '):
             return string[len(prefix) + 1:]
    return string

stripped = (list(strip_title(cell, prefixes) for cell in line)
            for line in lines)

This is not particularly efficient, since the algorithm ends up doing a lot of redundant checking (e.g. checking three times if the line starts with M). This sort of thing is a big reason to use regular expressions.

Alternatively, you could dynamically build a regular expression, by escaping each prefix and joining them with | branches:

def TitleStripper(prefixes):
    import re
    escaped_titles = (re.escape(prefix) for prefix in prefixes)
    prefix_re = re.compile('^({0}) '.format('|'.join(escaped_titles)))
    def strip_title(string):
        return prefix_re.sub('', string, 1)
    return strip_title

The function TitleStripper creates a closure function strip_title that works like the previous one but is built for a particular set of prefixes. After you call strip_title = TitleStripper(prefixes) you can just call strip_title(string).

Mostly due to the use of regular expressions, this will be a bit faster than the first method, perhaps at the expense of clarity.

If you really only ever need to check for three prefixes, either of these methods is overkill, and you should just use a static RE as explained in another answer.

intuited 2010-09-24 02:16:22

Why would I need to escape each prefix?

paracaudex 2010-09-24 02:26:23

For example, you'll need to escape a `.`, i.e. substitute `\.`, so that it doesn't match any character. You can do this with [re.escape](http://docs.python.org/library/re.html#re.escape).

intuited 2010-09-24 02:50:04

Ah, I see. I thought you meant escape the entire thing - like \Mr. I didn't realize re had an escape function.

paracaudex 2010-09-24 03:10:34

Answer 3

+1 A:

A more Pythonic approach would be to replace the "end of list" check with an else: clause to the for item in line: loop. The else gets executed if the for loop completes without being interrupted:

# Return new list without title prefixes for strings in a list of strings.    
def strip_titles(line, title_prefixes):
    new_csv_line = []
    for item in line:
        for title_prefix in title_prefixes:
            if item.startswith(title_prefix):
                new_csv_line.append(item[len(title_prefix)+1:])
                break
        else:
            new_csv_line.append(item)
    return new_csv_line

The logic is otherwise the same as yours.

Just Some Guy 2010-09-24 02:24:04

ansaurus

tags:

views:

answers:

Is there a Pythonic way to make this logic more elegant?

related questions