views:

48

answers:

1

The opposite may be achieved using pyparsing as follows:

from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo
#...
removeText = replaceWith("")
scriptOpen, scriptClose = makeHTMLTags("script")
scriptBody = scriptOpen + SkipTo(scriptClose) + scriptClose
scriptBody.setParseAction(removeText)
data = (scriptBody).transformString(data)

How could I keep the contents of the tag "table"?

UPDATE 0:

I tried: # keep only the tables tableOpen, tableClose = makeHTMLTags("table") tableBody = tableOpen + SkipTo(tableClose) + tableClose f = replaceWith(tableBody) tableBody.setParseAction(f) data = (tableBody).transformString(data) print data

and I get something like this...

garbages
<input type="hidden" name="cassstx"   value="en_US:frontend"></form></td></tr></table></span></td></tr></table> 

{<"table"> SkipTo:(</"table">) </"table">} 
<div id="asbnav" style="padding-bottom: 10px;">{<"table"> SkipTo:(</"table">) </"table">} 
</div> 
even more garbages

UPDATE 2:

Thanks Martelli. What I need is:

from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo
#...
data = 'before<script>ciao<table>buh</table>bye</script>after'

tableOpen, tableClose = makeHTMLTags("table")
tableBody = tableOpen + SkipTo(tableClose) + tableClose
thetable = (tableBody).searchString(data)[0][2]

print thetable
+1  A: 

You could first extract the table (similarly to the way you're now extracting the script but without the removal of course;-), obtaining a thetable string; then, you extract the script, replaceWith(thetable) instead of replaceWith(''). Alternatively, you could prepare a more elaborate parse action, but the simple two-phase approach looks more straightforward to me. E.g. (to preserve specifically the contents of the table, not the table tags):

from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo
#...
data = 'before<script>ciao<table>buh</table>bye</script>after'

tableOpen, tableClose = makeHTMLTags("table")
tableBody = tableOpen + SkipTo(tableClose) + tableClose
thetable = (tableBody).searchString(data)[0][2]

removeText = replaceWith(thetable)
scriptOpen, scriptClose = makeHTMLTags("script")
scriptBody = scriptOpen + SkipTo(scriptClose) + scriptClose
scriptBody.setParseAction(removeText)
data = (scriptBody).transformString(data)

print data

This prints beforebuhafter (what's outside the script tag, with the contents of the table tag sandwiched inside), hopefully "as desired".

Alex Martelli
The problem is that I don't know how to obtain the `thetable` string.
myle
@myle, see my edit with an example. Yeah, `pyparsing` docs are really scarce (unless maybe you're willing to pay for the O'Reilly book I guess) -- only serious defect of this neat package!
Alex Martelli