views:

561

answers:

2

I am just trying to write a small web page that can parse some text using a regular expression and return the resulting matches in a table. This is the first I've used python for web development, and I have to say, it looks messy.

My question is why do I only get output for the last match in my data set? I figure it has to be because the nested loops aren't formatted correctly.

Here's the data I provide:

groups is just an id correspoding to the regex group, and it's name to provide the header for the table.

pattern is something like:

(\d+)\s(\S+)\s(\S+)$

and data:

12345 SOME USER
09876 SOMEONE ELSE
54678 ANOTHER USER

My simple page:

<%
import re
pattern = form['pattern']
p = re.compile(pattern)
data = form['data']

matches = p.finditer(data)

lines = form['groups'].split("\n")
groupids ={}
for line in lines:
  key, val = line.split(' ')
  groupids[int(key.strip())] = val.strip()

%>
<html>
<table style="border-width:1px;border-style:solid;width:60%;">
<tr>
<%
for k,v in groupids.iteritems():%>
<th style="width:30px;text-align:center"><%= v %></th>
<%
# end
%>
</tr>
<%
for match in matches:
  #begin
%><tr>
<%
for i in range(1, len(match.groups())+1):
  #begin
%>
  <td style="border-style:solid;border-width:1px;border-spacing:0px;text-align:center;"><%= match.group(i) %></td>
<%
  #end
# end
%>
</tr>

</table>
</html>

Edit

Below is the test I ran

Code:

import re
pattern = "(\d\d\d\d\d)\s(\S+)\s(\S+)"

p = re.compile(pattern)

data = """12345 TESTS USERS
34567 TESTS USERS
56789 TESTS USERS"""

groups = """1 PIN
2 FNAME
3 LNAME"""

matches = p.finditer(data)

lines = groups.split("\n")

print lines
groupids ={}
for line in lines:
  key, val = line.split(' ')
  groupids[int(key.strip())] = val.strip()


for k,v in groupids.iteritems():
  print "%s\t" % v,
print ''

for match in matches:
  for i in range(1, len(match.groups())+1):
    print "%s\t" % match.group(i),
  print ''

Output:

PIN     FNAME   LNAME
12345   TESTS   USERS
34567   TESTS   USERS
56789   TESTS   USERS
A: 

I'm not sure about the interaction with the templating engine, but python would expect the inner loop to be indented under the containing loop.

Try formatting it that way and see if it works.

<%
for match in matches:
    %><tr><%
    for i in range(1, len(match.groups())+1):
        %><td style="border-style:solid;border-width:1px;border-spacing:0px;text-align:center;"><%= match.group(i) %></td><%
%>

Or some such. The above produces "IndentationError: unindent does not match any outer indentation level" so try:

<%
for match in matches:
    %><tr><%
    for i in range(1, len(match.groups())+1):
        %><td style="border-style:solid;border-width:1px;border-spacing:0px;text-align:center;"><%= match.group(i) %></td><%

%>

or

<%
for match in matches:
    %><tr><%
    for i in range(1, len(match.groups())+1):
        %><td style="border-style:solid;border-width:1px;border-spacing:0px;text-align:center;"><%= match.group(i) %></td><%
pass
%>

or some combination. Your problem is in indicating to python where the loop ends. To do this you must figure out a way to make the templating engine produce valid python with the right indentation.

Also, if you can get at the generated code you could split the problem in half: first tinker with the generated code to find out what python will accept and then tinker wit the template to get it produce that.

MarkusQ
That's what I thought, but when I indent like that I get "IndentationError: unindent does not match any outer indentation level"
scottm
Then that's a sign you're on the right track--it is, in fact, taking note of the indentation in the template file. So now the only question is how the morphing is working. I'll edit my answer to include a second suggestion.
MarkusQ
I'm not using a template file. The code for the page is what I've typed myself.
scottm
@scotty2012 What you typed is a template (with the <%...%> metabracket notation) that is being expanded / inverted to produce python. "Real" python doesn't have the metabrackets.
MarkusQ
@Markus, in that sense I understand. However, both suggestions you offered give the same results. I know the problem is with defining where the loop ends, that's why I need syntax. I've tried what seems like 50 different combinations and nothings seems to work.
scottm
+1  A: 
<%
for match in matches:
  #begin
%><tr>
<%
for i in range(1, len(match.groups())+1):
  #begin
%>
  <td style="border-style:solid;border-width:1px;border-spacing:0px;text-align:center;"><%= match.group(i) %></td>
<%
  #end
# end
%>

Yeah, you haven't got a nested loop there. Instead you've got a loop over matches that outputs “<tr>\n”, then a second loop over range(...) that only runs after the first has finished. The second is not inside the first because it isn't indented to say so.

From the doc, I think what you need to be saying is:

<%
for match in matches:
    # begin
%><tr><%
    for group in match.groups():
        # begin
%><td style="border-style:solid;border-width:1px;border-spacing:0px;text-align:center;"><%= group %></td><%
    # end
%></tr><%
# end
%>

But I can only agree with your “messy” comment: if PSP is requiring that you torture the indenting of your HTML to fit the structure of your Python like this, it is really Doing It Wrong and you should look for another, less awful templating syntax. There are many, many templating languages for Python that have a more sensible syntax for control structures. As an example, in the one I use the above would look like:

<px:for item="match" in="matches"><tr>
    <px:for item="group" in="match.groups()">
        <td style="border-style:solid;border-width:1px;border-spacing:0px;text-align:center;">
            <?_ group ?>
        </td>
    </px:for>
</tr></px:for>
bobince
Well done! I swear I tried that pattern but one lowly space must have been hiding somewhere. What templating language are you using in your example?
scottm
It's an XML-based system of my own invention! http://www.doxdesk.com/pxtl/, but I'm not trying to push one particular solution here (I hate those promote-your-favourite-framework topics); I think it'd be fair to say that any of the popular templating libraries would be more pleasant than the above!
bobince