hi
i need help with python programming:
i need a command which can search all the words between tags from a text file.
for example in the text file has <concept> food </concept>
. i need to search all the words between <concept>
and </concept>
and display them.
can anybody help please.......
views:
45answers:
3
+2
A:
- Load the text file into a string.
- Search the string for the first occurrence of
<concept>
usingpos1 = s.find('<concept>')
- Search for
</concept>
usingpos2 = s.find('</concept>', pos1)
The words you seek are then s[pos1+len('<concept>'):pos2]
Aaron Digulla
2010-06-25 07:16:05
This method does not take comments and tags with whitespace into account if question's author imply XML
nailxx
2010-06-25 07:21:53
+1 for simplicity
jensgram
2010-06-25 07:22:42
+1
A:
Have a look at regular expressions. http://docs.python.org/library/re.html
If you want to have for example the tag <i>
, try
text = "text to search. <i>this</i> is the word and also <i>that</i> end"
import re
re.findall("<i>(.*?)</i>",text)
Here's a short explanation how findall works: It looks in the given string for a given regular expression. The regular expression is <i>(.*?)</i>
:
<i>
denotes just the opening tag<i>
(.*?)
creates a group and matches as much as possible until it comes to the first</i>
, which concludes the tag
Note that the above solution does not mach something like
<i> here's a line
break </i>
Since you just wanted to extract words.
However, it is of course possible to do so:
re.findall("<i>(.*?)</i>",text,re.DOTALL)
phimuemue
2010-06-25 07:16:38
+2
A:
There is a great library for HTML/XML traversing named BeautifulSoup. With it:
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(open('myfile.xml', 'rt').read())
for t in soup.findAll('concept'):
print t.string
nailxx
2010-06-25 07:18:35