ansaurus

Question

String separation in required format, Pythonic way? (with or w/o Regex)

Answer 1

+3 A:

 [i.strip('@') for i in t.split(' ', 2)[:2]]     # for a fixed number of @def
 a = [i.strip('@') for i in t.split(' ') if i.startswith('@')]
 s = ' '.join(i for i in t.split(' ') if not i.startwith('@'))

SilentGhost 2009-02-17 18:37:22

The initial @elements could be any in number. This doesnt work

Lakshman Prasad 2009-02-17 18:40:52

that wasn't specified in your original question, but here you go.

SilentGhost 2009-02-17 18:45:08

Answer 2

+3 A:

You might also use regular expressions:

import re
rx = re.compile("@([\w]+) @([\w]+) (.*)")
t='@abc @def Hello this part is text and my email is [email protected]'
a,b,s = rx.match(t).groups()

But this all depends on how your data can look like. So you might need to adjust it. What it does is basically creating group via () and checking for what's allowed in them.

MrTopf 2009-02-17 18:40:23

OP says that number of @names is variable

SilentGhost 2009-02-17 18:54:51

Answer 3

+4 A:

How about this:

Splitting by space.
foreach word, check

2.1. if word starts with @ then Push to first list

2.2. otherwise just join the remaining words by spaces.

Osama ALASSIRY 2009-02-17 19:03:38

Answer 4

+3 A:

[edit: this is implementing what was suggested by Osama above]

This will create L based on the @ variables from the beginning of the string, and then once a non @ var is found, just grab the rest of the string.

t = '@one @two @three some text   afterward with @ symbols@ meow@meow'

words = t.split(' ')         # split into list of words based on spaces
L = []
s = ''
for i in range(len(words)):  # go through each word
    word = words[i]
    if word[0] == '@':       # grab @'s from beginning of string
        L.append(word[1:])
        continue
    s = ' '.join(words[i:])  # put spaces back in
    break                    # you can ignore the rest of the words

You can refactor this to be less code, but I'm trying to make what is going on obvious.

jcoon 2009-02-17 19:21:39

Answer 5

+7 A:

Building unashamedly on MrTopf's effort:

import re
rx = re.compile("((?:@\w+ +)+)(.*)")
t='@abc   @def  @xyz Hello this part is text and my email is [email protected]'
a,s = rx.match(t).groups()
l = re.split('[@ ]+',a)[1:-1]
print l
print s

prints:

['abc', 'def', 'xyz']
Hello this part is text and my email is [email protected]

Justly called to account by hasen j, let me clarify how this works:

/@\w+ +/

matches a single tag - @ followed by at least one alphanumeric or _ followed by at least one space character. + is greedy, so if there is more than one space, it will grab them all.

To match any number of these tags, we need to add a plus (one or more things) to the pattern for tag; so we need to group it with parentheses:

/(@\w+ +)+/

which matches one-or-more tags, and, being greedy, matches all of them. However, those parentheses now fiddle around with our capture groups, so we undo that by making them into an anonymous group:

/(?:@\w+ +)+/

Finally, we make that into a capture group and add another to sweep up the rest:

/((?:@\w+ +)+)(.*)/

A last breakdown to sum up:

((?:@\w+ +)+)(.*)
 (?:@\w+ +)+
 (  @\w+ +)
    @\w+ +

Note that in reviewing this, I've improved it - \w didn't need to be in a set, and it now allows for multiple spaces between tags. Thanks, hasen-j!

Brent.Longborough 2009-02-17 19:32:42

thanks for extending it :-) It wasn't clear to me at first that it can be any number of words. But I also had troubles to find the right syntax for the regexp when trying again actually. So I see that the anonymous group is now inside, I had it outside.

MrTopf 2009-02-17 23:22:00

would you bother to explain the regex? why does it find variable number of "tags" or whatever @thing is called?

hasen j 2009-02-18 00:57:16

Well-played Sir. Thanks for the thorough explanation.

Adam Bernier 2009-02-22 07:17:10

Answer 6

+7 A:

t='@abc @def Hello this part is text'

words = t.split(' ')

names = []
while words:
    w = words.pop(0)
    if w.startswith('@'):
        names.append(w[1:])
    else:
        break

text = ' '.join(words)

print names
print text

Ricardo Reyes 2009-02-17 19:32:42

I like this solution better then mine! voted up

jcoon 2009-02-17 19:44:59

Answer 7

+1 A:

Here's just another variation that uses split() and no regexpes:

t='@abc @def My email is [email protected]'
tags = []
words = iter(t.split())

# iterate over words until first non-tag word
for w in words:
  if not w.startswith("@"):
    # join this word and all the following
    s = w + " " + (" ".join(words))
    break
  tags.append(w[1:])
else:
  s = "" # handle string with only tags

print tags, s

Here's a shorter but perhaps a bit cryptic version that uses a regexp to find the first space followed by a non-@ character:

import re
t = '@abc @def My email is [email protected] @extra bye'
m = re.search(r"\s([^@].*)$", t)
tags = [tag[1:] for tag in t[:m.start()].split()]
s = m.group(1)
print tags, s # ['abc', 'def'] My email is [email protected] @extra bye

This doesn't work properly if there are no tags or no text. The format is underspecified. You'll need to provide more test cases to validate.

Martin Vilcans 2009-02-18 23:14:31

ansaurus

tags:

views:

answers:

String separation in required format, Pythonic way? (with or w/o Regex)

related questions