ansaurus

Question

Split tags in python

Answer 1

+3 A:

try templatemaker, a reverse-template maker. it can actually learn them automatically from examples!

ʞɔıu 2009-02-20 21:08:00

It is equivalent to `re.search('{{(.*?)}}', thedata).groups()` in this case.

J.F. Sebastian 2009-02-20 22:02:34

Answer 2

+6 A:

Mmkay, well here's a generator solution that seems to work well for me. You can also provide different open and close tags if you like.

def get_tags(s, open_delim  ='{{', 
                close_delim ='}}' ):

   while True:

      # Search for the next two delimiters in the source text
      start = s.find(open_delim)
      end   = s.find(close_delim)

      # We found a non-empty match
      if -1 < start < end:

         # Skip the length of the open delimiter
         start += len(open_delim)

         # Spit out the tag
         yield s[start:end].strip()

         # Truncate string to start from last match
         s = s[end+len(close_delim):]

      else:
         return

Run against your target input like so:

# prints: today, runner_up, blabla, oooo
for tag in get_tags(html):
    print tag

Edit: it also works against your new example :). In my obviously quick testing, it also seemed to handle malformed tags in a reasonable way, though I make no guarantees of its robustness!

Triptych 2009-02-20 21:09:48

Consider, s = '}} {{ tag }}'. Use `s.find(close_delim, start)`

J.F. Sebastian 2009-02-20 21:48:43

Answer 3

A:

If the data is that straightforward, a simple regex would do the trick.

Harper Shelby 2009-02-20 21:10:17

Answer 4

+2 A:

I know you said no regex/split, but I couldn't help but try for a one-liner solution:

import re
for s in re.findall("\{\{.*\}\}",thedata):
        print s.replace("{","").replace("}","")

EDIT: JFS

Compare:

>>> re.findall('\{\{.*\}\}', '{{a}}b{{c}}')
['{{a}}b{{c}}']
>>> re.findall('{{(.+?)}}', '{{a}}b{{c}}')
['a', 'c']

Ryan 2009-02-20 21:14:03

Am I crazy, or are you checking for runs of non-space characters in the source data? In the first example, the tags are padded with spaces. Won't this break?

Triptych 2009-02-20 21:18:44

Yeah, I'm right. Doesn't work against the first example. -1

Triptych 2009-02-20 21:21:12

Aha, you're right. I changed it to match {{ absolutely anything }}. If you don't want it to match {{}} then replace .* with .+

Ryan 2009-02-20 21:24:26

that's pretty cool! thx!

Jon Romero 2009-02-20 21:26:13

It is greedy. It eats too much. Use: `print re.findall(r'{{(.+?)}}', thedata)`

J.F. Sebastian 2009-02-20 21:29:00

I thought so too @JF but it didn't seem to break when I tested. @Ryan, I removed the downvote.

Triptych 2009-02-20 21:29:58

@Triptych: I've added example for 'greedy'.

J.F. Sebastian 2009-02-20 21:36:07

Answer 5

+1 A:

J.F. Sebastian wrote this in a comment but I thought it was good enough to deserve its own answer:

re.findall(r'{{(.+?)}}', thestring)

I know the OP was asking for a way that didn't involve splits or regexes - so maybe this doesn't quite answer the question as stated. But this one line of code definitely gets my vote as the most Pythonic way to accomplish the task.

David Zaslavsky 2009-02-20 21:34:59

actually I was tempted to vote it (I didn't have that in mind when I said reg expressions) but it would be unfair for the other answer. To bad I can't vote both :(

Jon Romero 2009-02-20 21:37:18

ansaurus

tags:

views:

answers:

Split tags in python

related questions