ansaurus

Question

Answer 1

+1 A:

Rather than going directly from the text file you describe to an XHTML file, I would transform it into an intermediate in-memory representation first.

So I would build classes to represent the p and h1 tags, and then go through the text file and build those objects and put them into a list (or even a more complex object, but from the looks of your file a list should be sufficient). Then I would pass the list to another function that would loop through the p and h1 objects and output them as XHTML.

As an added bonus, I would make each tag object (say, Paragraph and Heading1 classes) implement an as_xhtml() method, and delegate the actual formatting to that method. Then the XHTML output loop could be something like:

for tag in input_tags:
    xhtml_file.write(tag.as_xhtml())

Daniel Pryden 2009-10-21 22:52:25

Answer 2

+1 A:

You say you're very new to Python, so I'll start at the very low-level. You can iterate over the lines in a file very simply in Python

fyle = open("contents.txt")
for lyne in fyle :
    # Do string processing here
fyle.close()

Now how to parse it. If each formatting directive (e.g. p, h1), is on a separate line, you can check that easily. I'd build up a dictionary of handlers and get the handler like so:

handlers= {"p": # p tag handler
           "h1": # h1 tag handler
          }

# ... in the loop
    if lyne.rstrip() in handlers :  # strip to remove trailing whitespace
        # close current handler?
        # start new handler?
    else :
        # pass string to current handler

You could do what Daniel Pryden suggested and create an in-memory data structure first, and then serialize that the XHTML. In that case, the handlers would know how to build the objects corresponding to each tag. But I think the simpler solution, especially if you don't have lots of time, you have is just to go straight to XHTML, keeping a stack of the current enclosed tags. In that case your "handler" may just be some simple logic to write the tags to the output file/string.

I can't say more without knowing the specifics of your problem. And besides, I don't want to do all your homework for you. This should give you a good start.

AFoglia 2009-10-22 03:31:21

+1 nice answer. I would make a point of calling `fyle.close()`, though (or, even better, using `with open("contents.txt") as fyle:`). It's a good habit to get into -- you can often get away with letting the garbage collector take care of open files, but you really shouldn't.

Daniel Pryden 2009-10-22 15:48:36

You're right, and I've added `fyle.close()` as you've suggested. `with ... as` is better, but this is simpler to understand for a beginner.

AFoglia 2009-10-23 02:06:07

ansaurus

tags:

views:

answers:

Parsing a text file with Python?

related questions