views:

23

answers:

1

I'm trying to parse an html form using mechanize. The form itself has an arbitrary number of hidden fields and the field names and id's are randomly generated so I have no obvious way to directly select them. Clearly using a name or id is out, and due to the random number of hidden fields I cannot select them based on the sequence number since this always changes too.

However there are always two TextControl fields right after each other, and then below that is a TextareaControl. These are the 3 fields I need access too, basically I need to parse their names and all is well. I've been looking through the mechanize documentation for the past couple hours and haven't come up with anything that seems to be able to do this, however simple it should seem to be (to me anyway).

I have come up with an alternate solution that involves making a list of the form controls, iterating through it to find the controls that contain the string 'Text' returning a new list of those, and then finally stripping out the name using a regular expression. While this works it seems unnecessary and I'm wondering if there's a more elegant solution. Thanks guys.

edit: Here's what I'm currently doing to extract that info if anyone's curious. I think I'm probably just going to stick with this. It seems unnecessary but it gets the job done and it's nothing intensive so I'm not worried about efficiency or anything.

def formtextFieldParse(browser):
 '''Expects a mechanize.Browser object with a form already selected. Parses 
 through the fields returning a tuple of the name of those fields. There 
 SHOULD only be 3 fields. 2 text followed by 1 textarea corresponding to 
 Posting Title, Specific Location, and Posting Description'''
 import re
 pattern = '\(.*\)'
 fields = str(browser).split('\n')
 textfields = []
 for field in fields:
     if 'Text' in field: textfields.append(field)
 titleFieldName = re.findall(pattern, textfields[0])[0][1:-2]
 locationFieldName = re.findall(pattern, textfields[1])[0][1:-2]
 descriptionFieldName = re.findall(pattern, textfields[2])[0][1:-2]
+1  A: 

I don't think mechanize has the exact functionality you require; could you use mechanize to get the HTML page, then parse the latter for example with BeautifulSoup?

Alex Martelli
Ah, well thanks. I probably won't look into it for this project as it took long enough to learn mechanize (really not that long at all, but it's the first non included library I've really learned to use so far). Depending what I do next I may well explore it though.
kryptobs2000