sax

How to configure Java's SaxParserFactory to disable entity checking?

I am writing a screen scraping app that reads out various pages and extracts the data. I'm using the SAXParserFactory go get a SAXParser which in turn gets me an XMLReader. I have configured the Factory like this: spf = SAXParserFactory.newInstance(); spf.setValidating(false); spf.setFeature("http://xml.org/sax/features/validation", fal...

Parsing XML with SAX/Python + no validation

Hi all, Hi, I am new to python and I'm trying to parse a XML file with SAX without validating it. The head of my xml file is: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE n:document SYSTEM "schema.dtd"> <n:document.... and I've tried to parse it with python 2.5.2: from xml.sax import make_parser, handler import sys parser = mak...

validating xml in java as the document is built

I am working on converting an excel spread sheet into an xml document that needs to be validated against a schema. I am currently building the xml document using the DOM api, and validating at the end using SAX and a custom error handler. However, I would really like to be able to validate the xml produced from each Cell as I parse the e...

Java: How to display an XML file in a JTree

Hi, I would like to have a way to display the contents of an XML file in a JTree. I have already accomplished this using DOM, by implementing a custom TreeModel (and TreeCellRenderer). However it is very clunky (much workaround-ery and hackery) and rather rough around the edges. Is anyone aware of a way to get a JTree to display the co...

SAX, StringBuilder and memory leak

Hi All. I have strange problem. I'm parsing a document with large text field. in my characters section i'm using StringBuilder currentStory.append(ch, start, length); then in my endElement i'm assigning it to the appropriate field on my object. if (name.equals(tagDesc)) { inDesc = false; if (currentItem != null ) { ...

Can't read some attributes with SAX

Hi all, I'm trying to parse that document with SAX: <scxml version="1.0" initialstate="start" name="calc"> <datamodel> <data id="expr" expr="0" /> <data id="res" expr="0" /> </datamodel> <state id="start"> <transition event="OPER" target="opEntered" /> <transition event="DIGIT" target="operand" /> ...

Lazy SAX XML parser with stop/resume

I am pretty sure the answer is no but of course there are cleverer guys than me! Is there a way to construct a lazy SAX based XML parser that can be stopped (e.g. raising an exception is a possible way of doing this) but also resumable ? I am looking for a possible solution for Python >= 2.6 with standard XML libraries. The "lazy" part...

SAX parsing - efficient way to get text nodes

Given this XML snippet <?xml version="1.0"?> <catalog> <book id="bk101"> <author>Gambardella, Matthew</author> In SAX, it is easy to get attribute values: @Override public void startElement (String uri, String localName, String qName, Attributes attributes) throws SAXException{ if(qName.equals("book")){ ...

How can I process xml asynchronously in python?

I have a large XML data file (>160M) to process, and it seems like SAX/expat/pulldom parsing is the way to go. I'd like to have a thread that sifts through the nodes and pushes nodes to be processed onto a queue, and then other worker threads pull the next available node off the queue and process it. I have the following (it should have...

Getting values from SAX attributes when namespaces are involved.

I'm using SAX to parse some XML. In my handler's startElement() method I'm trying to read the value of an attribute named xsi:type with something like: String type = attributes.getValue("xsi:type"); However, it always returns null. This works fine for everything else so I'm assuming that it's due to the namespace prefix. How can I get...

VBScript: Error 10023 in : Array index out of range (trouble when reusing an array variable)

Using Sax ActiveX Scripting (long story), I have 3 nested if statements which reuse the same return variable. Script looks roughly like: Dim rtnArray As Variant If variable1 <> "" Then ' Perform SQL query against DB2 database rtnArray = DB2SQLSearch(Query) If UBound(rtnArray) = 0 Then ' ditto rtnArray = DB2SQ...

Capturing mixed content in XML using a SAX Parser

Is a SAX Parser capable of capturing mixed content within an XML document (see example below)? <element>here is some <b>mixed content</b></element> ...

android Sax parsing exception for "»" character

hi friends i'm using Sax parser for parsing my xml file which i recieve from the internet... The problem is that the normal xml is parsed fine except the xml files which have "»" symbol in the attributes... everytime i try parsing the file i get the following error 02-11 16:57:35.547: INFO/System.out(754): org.apache.harmony.xml.Expat...

How can I get the text between tags using python SAX parser ?

What I need is just get the text of the corresponding tag and persist it into database. Since the xml file is big (4.5GB) I'm using sax. I used the characters method to get the text and put it in a dictionary. However when I'm printing the text at the endElement method I'm getting a new line instead of the text. Here is my code: def ch...

ignore some XML tags in SAX

Hi all I'm parsing an XML document using SAX in Java. I'm working with the XML that describes research publications in different fields. Among others there are elements like "abstract" that shortly describes what the reserch paper is about. The basic HTML formatting is allowed in that field, but I don't want the SAX to threat the HTML ta...

Efficient merging of multiple, large xml files into one

I searched the web and I searched stackoverflow up and down. No solution. Although I found solutions how to do this within pure xslt here. But the problem is that the resulting xml will be several hundred MB large. So I must do this with SAX in Java. (please no xslt solution, although I tagged it with xslt ;-)) Let me explain with more...

Dom Vs Sax - creating Xmls

Hey, I know the difference between Sax and Dom is pretty substantial regarding parsing Xml, but what about creating ones ? is there even a way to create new Xml using Sax or that if i want to create new Xml file based on my data in my program , i will have to use DOM ? Thanks ...

Parsing of badly formated HTML in PHP

In my code I convert some styled xls document to html using openoffice. I then parse the tables using xml_parser_create. The problem is that openoffice creates oldschool html with unclosed <BR> and <HR> tags, it doesn't create doctypes and don't quote attributes <TABLE WIDTH=4>. The php parsers I know off don't like this, and yield xml ...

Parsing broken XML with lxml.etree.iterparse

Hi, I'm trying to parse a huge xml file with lxml in a memory efficient manner (ie streaming lazily from disk instead of loading the whole file in memory). Unfortunately, the file contains some bad ascii characters that break the default parser. The parser works if I set recover=True, but the iterparse method doesn't take the recover ...

How can I force a SAX parser to use a DTD if one is not specified in the input file?

How can I force a SAX parser (specifically, Xerces in Java) to use a DTD when parsing a document without having any doctype in the input document? Is this even possible? Here are some more details of my scenario: We have a bunch of XML documents that conform to the same DTD that are generated by multiple different systems (none of whi...