ansaurus

Question

Delete the \n and following letters in the end of words in a list

Answer 1

+4 A:

>>> import re
>>> wordlist = ['Schreiben\nEs', 'Schreiben', \
    'Schreiben\nEventuell', 'Schreiben\nHaruki']
>>> [ re.sub("\n.*", "", word) for word in wordlist ]
['Schreiben', 'Schreiben', 'Schreiben', 'Schreiben']

Done via re.sub:

>>> help(re.sub)
  1 Help on function sub in module re:
  2 
  3 sub(pattern, repl, string, count=0)
  4     Return the string obtained by replacing the leftmost
  5     non-overlapping occurrences of the pattern in string by the
  6     replacement repl.  repl can be either a string or a callable;
  7     if a callable, it's passed the match object and must return
  8     a replacement string to be used.

The MYYN 2009-12-27 17:40:56

I am not sure but i guess he also wants to remove the successive chars after \n

JCasso 2009-12-27 17:42:18

yes, already corrected ..

The MYYN 2009-12-27 17:43:00

but i also want to delete the following letters after \n and ""! Thanks again

kame 2009-12-27 17:44:41

yes .. i corrected the initial sol'n ..

The MYYN 2009-12-27 17:46:29

I am very glad that you help me. there are so many functions in python. you are a great help. of course i will read more about re-module now. :)

kame 2009-12-27 17:48:15

Answer 2

+1 A:

You could use a regular expression to do so:

import re
wordlist = [re.sub("\n.*", "", word) for word in wordlist]

The regular expression \n.* matches the first \n and anything that might follow (.*) and replaces it with nothing.

Gumbo 2009-12-27 17:42:52

Answer 3

+3 A:

[w[:w.find('\n')] fow w in wordlist]

few tests:

$ python -m timeit -s "wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']" "[w[:w.find('\n')] for w in wordlist]"
100000 loops, best of 3: 2.03 usec per loop
$ python -m timeit -s "import re; wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']" "[re.sub('\n.*', '', w) for w in wordlist]"
10000 loops, best of 3: 17.5 usec per loop
$ python -m timeit -s "import re; RE = re.compile('\n.*'); wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']" "[RE.sub('', w) for w in wordlist]"
100000 loops, best of 3: 6.76 usec per loop

Edit:

The solution above is completely wrong (see the comment from Peter Hansen). here the corrected one:

def truncate(words, s):
    for w in words:
        i = w.find(s)
        yield w[:i] if i != -1 else w

mg 2009-12-27 17:52:58

What an incredibly bad (i.e. totally untested) answer, given that it quietly truncates words that do NOT have a newline in them. str.find() returns -1 in the case where there is no match, and slicing with [:-1] will return everything up to but not including the last character. Please remove.

Peter Hansen 2009-12-27 18:43:19

@Peter Hansen: thanks for your report, i was thinking how to make it one line for timeit and i forgot the correctness.

mg 2009-12-27 19:18:02

@mg, okay... just fix the for loop in the edited part now please. "For w in in words:" has an extra "in".

Peter Hansen 2009-12-27 19:19:42

You could get a small speed up (~10 %) by using `RE=re.compile(…).sub` and `[RE('', w)…]`: there is no need to look for the `sub()` method for each word.

EOL 2009-12-27 19:28:30

Answer 4

A:

>>> wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']
>>> [ i.split("\n")[0] for i in wordlist ]
['Schreiben', 'Schreiben', 'Schreiben', 'Schreiben']

ghostdog74 2009-12-27 23:55:15

ansaurus

tags:

views:

answers:

Delete the \n and following letters in the end of words in a list

related questions