views:

344

answers:

5

I have a file that contains this:

<html>
  <head>
    <title> Hello! - {{ today }}</title>
  </head>
  <body>
    {{ runner_up }} 
         avasd
         {{ blabla }}
        sdvas
        {{ oooo }}
   </body>
</html>

what is the best more pythonic way to extract the {{today}}, {{runner_up}} etc?

I know it can be done with splits/reg but I wonder if there is another way.

PS: consider the data loaded in a variable called thedata :D

Thx

Edit: I think that the html example was bad, because it directed some commenters to BeautifuSoup. So, here is a new input data:

Fix grammatical or {{spelling}} errors.

Clarify meaning without changing it.

Correct minor {{mistakes}}.

Add related resources or links.

Always respect the original {{author}}.

output:

spelling
mistakes
author
+3  A: 

try templatemaker, a reverse-template maker. it can actually learn them automatically from examples!

ʞɔıu
It is equivalent to `re.search('{{(.*?)}}', thedata).groups()` in this case.
J.F. Sebastian
+6  A: 

Mmkay, well here's a generator solution that seems to work well for me. You can also provide different open and close tags if you like.

def get_tags(s, open_delim  ='{{', 
                close_delim ='}}' ):

   while True:

      # Search for the next two delimiters in the source text
      start = s.find(open_delim)
      end   = s.find(close_delim)

      # We found a non-empty match
      if -1 < start < end:

         # Skip the length of the open delimiter
         start += len(open_delim)

         # Spit out the tag
         yield s[start:end].strip()

         # Truncate string to start from last match
         s = s[end+len(close_delim):]

      else:
         return

Run against your target input like so:

# prints: today, runner_up, blabla, oooo
for tag in get_tags(html):
    print tag

Edit: it also works against your new example :). In my obviously quick testing, it also seemed to handle malformed tags in a reasonable way, though I make no guarantees of its robustness!

Triptych
Consider, s = '}} {{ tag }}'. Use `s.find(close_delim, start)`
J.F. Sebastian
A: 

If the data is that straightforward, a simple regex would do the trick.

Harper Shelby
+2  A: 

I know you said no regex/split, but I couldn't help but try for a one-liner solution:

import re
for s in re.findall("\{\{.*\}\}",thedata):
        print s.replace("{","").replace("}","")

EDIT: JFS

Compare:

>>> re.findall('\{\{.*\}\}', '{{a}}b{{c}}')
['{{a}}b{{c}}']
>>> re.findall('{{(.+?)}}', '{{a}}b{{c}}')
['a', 'c']
Ryan
Am I crazy, or are you checking for runs of non-space characters in the source data? In the first example, the tags are padded with spaces. Won't this break?
Triptych
Yeah, I'm right. Doesn't work against the first example. -1
Triptych
Aha, you're right. I changed it to match {{ absolutely anything }}. If you don't want it to match {{}} then replace .* with .+
Ryan
that's pretty cool! thx!
Jon Romero
It is greedy. It eats too much. Use: `print re.findall(r'{{(.+?)}}', thedata)`
J.F. Sebastian
I thought so too @JF but it didn't seem to break when I tested. @Ryan, I removed the downvote.
Triptych
@Triptych: I've added example for 'greedy'.
J.F. Sebastian
+1  A: 

J.F. Sebastian wrote this in a comment but I thought it was good enough to deserve its own answer:

re.findall(r'{{(.+?)}}', thestring)

I know the OP was asking for a way that didn't involve splits or regexes - so maybe this doesn't quite answer the question as stated. But this one line of code definitely gets my vote as the most Pythonic way to accomplish the task.

David Zaslavsky
actually I was tempted to vote it (I didn't have that in mind when I said reg expressions) but it would be unfair for the other answer. To bad I can't vote both :(
Jon Romero