parsing

Validating XML parser for Flex/actionscript?

Is there a validating XML parser for flex/actionscript? The XML class verifies that it is well formed XML, but not that it follows the rules of the DTD. Java has a validating XML parser, but is there one for flex/actionscript? ...

Is there an Open Source Python library for sanitizing HTML and removing all Javascript?

I want to write a web application that allows users to enter any HTML that can occur inside a <div> element. This HTML will then end up being displayed to other users, so I want to make sure that the site doesn't open people up to XSS attacks. Is there a nice library in Python that will clean out all the event handler attributes, <scri...

Flexible numeric string parsing in Python

Are there any Python libraries that help parse and validate numeric strings beyond what is supported by the built-in float() function? For example, in addition to simple numbers (1234.56) and scientific notation (3.2e15), I would like to be able to parse formats like: Numbers with commas: 2,147,483,647 Named large numbers: 5.5 billion ...

What is the best LALR parser generator for C++ that can generate meaningful error messages

I am looking for the best solution for a LALR parser generator for C++ that will allow me to generate really good error messages. I really hate the syntax errors that MySQL generates and I want to take the parser in it and replace it with a "lint" checker that will tell me more than just ERROR 1064 (42000): You have an error in your S...

Check sql script valid

As part of a release we run a load of PL/SQL scripts against a database. Recently someone left the ; off the end of a line in one script that was called another script so this meant that script did not get run. Because this did not cause an error, it just didn't get run, it took quite a while to track down what had happened. I want to c...

Parse string to DateTime

Hi! I have a source where dates comes in this string form: Sat Sep 22 13:15:03 2018 Is there any easy way I can parse that as a datetime in C#? I've tried with DateTime.(Try)Parse, but it doesn't seem to recognize this specific format... ...

Python: How do I read and parse a unicode utf-8 text file?

I am exporting UTF-8 text from Excel and I want to read and parse the incoming data using Python. I've read all the online info so I've already tried this, for example: txtFile = codecs.open( 'halout.txt', 'r', 'utf-8' ) for line in txtFile: print repr( line ) The error I am getting is: UnicodeDecodeError: 'utf8' codec can't dec...

Re-format items inside list read from CSV file in Python

I have some lines in a CSV file like this: 1000001234,Account Name,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00",Some string,Some string 2 If you notice, some numbers are enclosed in " " and has a thousand separator ",". I want to remove the thousand separator and the double quote enclosure. For the qoute enclosure, I'm t...

Parsing line and selecting values corresponding to a key

there is a set of data which is arranged in a specific manner (as a tree), as is given below. basically a key=value pair, with some additional values at the end, which informs how many children does the branch have and some junk value. 11=1 123 2 11=1>1=45 234 1 11=1>1=45>9=16 345 1 11=1>1=45>9=16>2=34 222 1 11=1>1=45>9=16>2=34>7=0 2234...

How to extract the RDF snippet out of an HTML page?

I want to extract the RDF snippet of an web page. Since it can even be inside a HTML-comment I'm at a loss here. Can anybody point me into the right direction, what libraries or classes to use or something like that? The goal is to have the trackback URL to be able to send trackbacks. ...

Rexexp. Big text with hierarchy

I have a text of law, with Chapters and Articles. Chapter 1. Something Article 1. trata-trata Article 2. trata-trata Article 3. trata-trata Chapter 2. Something Article 4. trata-trata Article 5. trata-trata Article 6. trata-trata I need regexp, to find Articles within Chapters, and know what articles belongs to what Chapter. (...

Parsing amount strings into numbers

I am working on a system that is recognizing paper documents using OCR engines. These documents are invoices containing amounts such as total, vat and net amounts. I need to parse these amount strings into numbers, but they are coming in many formats and flavors using different symbols for decimal and thousands separation in the number i...

What is the fastest way to deconstruct a fixed length binary/alpha message?

What would you suggest as the fastest or best way to parse a fixed length message in c++ which has fields defined like field = 'type', length = 2, type = 'alphanumeric' field = 'length', length = 2, type = 'binary' (edit:length = 2 means 16 bit) ... ... and so on I read about making a struct and then using reinterpret_cast but im no...

How to parse time stamps with Unicode characters in Java or Perl?

I'm trying to make my code as generic as possible. I'm trying to parse install time of a product installation. I will have two files in the product, one that has time stamp I need to parse and other file tells the language of the installation. This is how I'm parsing the timestamp public class ts { public static void main (String[]...

java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration incompatible with org.apache.xerces.xni.parser.XMLParserConfiguration

I get this error when deploying the ear file on to WLS 10.3 on AIX platform. (The same ear works fine on Windows / Linux platforms) Caused by: java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration incompatible with org.apache.xerces.xni.parser.XMLParserConfiguration at org.apache.xerces.parsers.DOM...

How do I parse out n-bit elements from a byte addressable array

I have a data stream that is addressable only in 8-bit bytes, I want to parse it out into 6-bit elements and store that into an array. Is there any best known methods to do this? 11110000 10101010 11001100 into an array like 111100|001010|101011|001100 (can have zero padding, just needs to be addressable this way) and the dat...

Parsing BBCode with xslt 2.0

Hi. I need help finding a viable solution to convert bbcode to html, this is where ive come so far, but fails when bbcodes get wrapped. Src: [quote id="ohoh81"]asdasda [quote id="ohoh80"]adsad [quote id="ohoh79"]asdad[/quote] [/quote] [/quote] Code: <xsl:variable name="rules"> <code check="&#xD;" >&lt;br/&g...

.NET DateTime.Parse

When trying to use the Parse method on the DateTime class I get an exception thrown: String was not recognized as a valid DateTime. The string reads as "26/10/2009 8:47:39 AM" when outputted. This string is obtained from a group on a match from a regex. None of the strings obtained from this match group will parse to datetime. (WT...

how to identify json object or json array from a json parsing text?

hi i hav one problem , i parsed xml using json parsing and get json text. now i have to get values from xml. from that json text how i can identify jsonObject, jsonArray etc... ...

Can a pdfbox or itextsharp or pdfsharp reads corrupted pdf file

I recently downloaded pdf libraries ( pdfbox, pdfsharp, itextsharp), and I am trying to figure out can i parse corrupted pdf files with in Asp.Net. which lib is best for reading corrupted pdf file. ...