views:

316

answers:

2

Hello, I have to do an assignment where i have a .txt file that contains something like this
p
There is no one who loves pain itself, who seeks after it and wants to have it, simply because it is pain...

h1
this is another example of what this text file looks like

i am suppose to write a python code that parses this text file and creates and xhtml file
I need to find a starting point for this project because i am very new to python and not familiar with alot of this stuff.
This python code is suppose to take each of these "tags" from this text file and put them into an xhtml file I hope that what i ask makes sense to you.
Any help is greatly appreciated,
Thanks in advance!
-bojan

+1  A: 

Rather than going directly from the text file you describe to an XHTML file, I would transform it into an intermediate in-memory representation first.

So I would build classes to represent the p and h1 tags, and then go through the text file and build those objects and put them into a list (or even a more complex object, but from the looks of your file a list should be sufficient). Then I would pass the list to another function that would loop through the p and h1 objects and output them as XHTML.

As an added bonus, I would make each tag object (say, Paragraph and Heading1 classes) implement an as_xhtml() method, and delegate the actual formatting to that method. Then the XHTML output loop could be something like:

for tag in input_tags:
    xhtml_file.write(tag.as_xhtml())
Daniel Pryden
+1  A: 

You say you're very new to Python, so I'll start at the very low-level. You can iterate over the lines in a file very simply in Python

fyle = open("contents.txt")
for lyne in fyle :
    # Do string processing here
fyle.close()

Now how to parse it. If each formatting directive (e.g. p, h1), is on a separate line, you can check that easily. I'd build up a dictionary of handlers and get the handler like so:

handlers= {"p": # p tag handler
           "h1": # h1 tag handler
          }

# ... in the loop
    if lyne.rstrip() in handlers :  # strip to remove trailing whitespace
        # close current handler?
        # start new handler?
    else :
        # pass string to current handler

You could do what Daniel Pryden suggested and create an in-memory data structure first, and then serialize that the XHTML. In that case, the handlers would know how to build the objects corresponding to each tag. But I think the simpler solution, especially if you don't have lots of time, you have is just to go straight to XHTML, keeping a stack of the current enclosed tags. In that case your "handler" may just be some simple logic to write the tags to the output file/string.

I can't say more without knowing the specifics of your problem. And besides, I don't want to do all your homework for you. This should give you a good start.

AFoglia
+1 nice answer. I would make a point of calling `fyle.close()`, though (or, even better, using `with open("contents.txt") as fyle:`). It's a good habit to get into -- you can often get away with letting the garbage collector take care of open files, but you really shouldn't.
Daniel Pryden
You're right, and I've added `fyle.close()` as you've suggested. `with ... as` is better, but this is simpler to understand for a beginner.
AFoglia