HI all
I'm new to python and programming. I need to read in chunks of a large text file, format looks like the following:
<word id="8" form="hibernis" lemma="hibernus1" postag="n-p---nb-" head-"7" relation="ADV"/>
I need the form
, lemma
and postag
information. e.g. for above I need hibernis
, hibernus1
and n-p---nb-
.
How do I tell python to read until it reaches form, to read forward until it reaches the quote mark "
and then read the information between the quote marks "hibernis"
? Really struggling with this.
My attempts so far have been to remove the punctuation, split the sentence and then pull the info I need from a list. Having trouble getting python to iterate over whole file though, I can only get this working for 1 line. My code is below:
f=open('blank.txt','r')
quotes=f.read()
noquotes=quotes.replace('"','')
f.close()
rf=open('blank.txt','w')
rf.write(noquotes)
rf.close()
f=open('blank.txt','r')
finished = False
postag=[]
while not finished:
line=f.readline()
words=line.split()
postag.append(words[4])
postag.append(words[6])
postag.append(words[8])
finished=True
Would appreciate any feedback/criticisms
thanks