ansaurus

Question

python search from tag

Answer 1

+2 A:

Load the text file into a string.
Search the string for the first occurrence of <concept> using pos1 = s.find('<concept>')
Search for </concept> using pos2 = s.find('</concept>', pos1)

The words you seek are then s[pos1+len('<concept>'):pos2]

Aaron Digulla 2010-06-25 07:16:05

This method does not take comments and tags with whitespace into account if question's author imply XML

nailxx 2010-06-25 07:21:53

+1 for simplicity

jensgram 2010-06-25 07:22:42

Answer 2

+1 A:

Have a look at regular expressions. http://docs.python.org/library/re.html

If you want to have for example the tag , try

text = "text to search. <i>this</i> is the word and also <i>that</i> end"
import re
re.findall("<i>(.*?)</i>",text)

Here's a short explanation how findall works: It looks in the given string for a given regular expression. The regular expression is (.*?):

 denotes just the opening tag 
(.*?) creates a group and matches as much as possible until it comes to the first
, which concludes the tag

Note that the above solution does not mach something like

<i> here's a line
break </i>

Since you just wanted to extract words.

However, it is of course possible to do so:

re.findall("<i>(.*?)</i>",text,re.DOTALL)

phimuemue 2010-06-25 07:16:38

Answer 3

+2 A:

There is a great library for HTML/XML traversing named BeautifulSoup. With it:

from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(open('myfile.xml', 'rt').read())
for t in soup.findAll('concept'):
   print t.string

nailxx 2010-06-25 07:18:35

ansaurus

tags:

views:

answers:

python search from tag

related questions