parsing

Java simple sentence parser

Hi guys, is there any simple way to create sentence parser in plain Java without adding any libs and jars. Parser should not just take care about blanks between words, but be more smart and parse: . ! ?, recognize when sentence is ended etc. After parsing, only real words could be all stored in db or file, not any special chars. tha...

Parsing iCalendar data with XPath/XSLT

I am working with a XML driven CMS, and before I run off and either write or implement a module that parses the iCal format, I was wondering if there was any way to parse it using just XSLT or ideally just an XPath expression, as this is a built in function of the CMS. ...

how to parse a page that is going on 302 header??

i have to parse a page in php,the url of the page is going on 302 Moved temporarily header and is moved to a not found page.Its data can be retrieved manually through console option of firebug add on of mozilla.But if i try to parse it using php it gives me that not found page in return.How can i parse that page please suggest?? edit: i...

Looking for a library for simple protocol implementation

Hello, I need to implement a simple over-the-network interaction in C++ and I've been wondering whether there are libraries that do that already. My protocol basically sends messages and receives responses. Each message is just a set of 3-4 values of basic data types. I would like to find a library (or libraries) that can do one or more...

any html/css parsing library for ruby & PHP?

I am about to finish my script that parses/scrapes website using mechanize&ruby. I need to port my script to PHP in the future. My question is if there is any library available for both ruby and php or if anybody can recommend any other approach to this? ...

parsing a text string for dates - not the standard convert problem!

Does anyone know of a library - ideally Python, that can have a stab at pulling dates out of text? "Shall we go to the library today" -> 21 Jan 10 "Starting on the 1st of January" -> 1 Jan 10 "Anytime between 3nd and 5th of Feb 2009" -> 3 Feb 09, 5 Feb 09 It's a tough problem and probably why I havn't found anything! Already using N...

Parsing XML Textlist

Hi there, I'm trying to parse a XML file. I'm able to parse normal text node but how do I parse a textlist? I'm getting the firstChild of the textlist thats sadly all. If I try to do elem.nextSibling(); it is always null which can't be, I know there are two other values left. Does someone can provide me an example maybe? Thanks! ...

SQL to find first non-numeric character in a string

I inherited a table with identifiers in a format [nonnumericprefix][number]. For example (ABC123; R2D2456778; etc). I was wondering if there was a good way to split this in SQL into two fields, the largest integer formed from the right side, and the prefix, for example (ABC, 123; R2D, 2456778; etc). I know I can do this with a cursor,...

EDI Inquiry: Is this an acceptable way to retrieve EDI encoded data?

I'm new to EDI, and I have a question. I have read that you can get most of what you need about an EDI format by looking at the last 3 characters of the ISA line. This is fine if every EDI used line breaks to separate entities, but I have found that many are single line files with any number of characters used as breaks. I have notice...

Handling numbers with leading zeros in Tcl

I am having trouble in Tcl using numbers with leading zeros. I am parsing some numbers that can have leading zeros, such as "0012", which should be interpreted as the integer "twelve". $ tclsh % set a 8 8 % set b 08 08 % expr $a - 1 7 % expr $b - 1 expected integer but got "08" (looks like invalid octal number) What is the best way ...

Not Another Parse-HTML-With-Regex Question

Hello, I've read a few questions on here re parsing HTML with regex, and I understand that this is, on the whole, a terrible idea. Having said this, I have a very specific problem that I think Regex might be the answer to. I've been fumbling around trying to work out the answer but I'm new (today) to Regex, and I was hoping some kind ...

Converting hexadecimal numbers in strings to negative numbers, in Perl

I have a bunch of numbers represented as hexadecimal strings in logfiles that are being parsed by a Perl script, and I'm relatively inexperienced with Perl. Some of these numbers are actually signed negative numbers, ie 0xFFFE == -1 when represented as a 16-bit signed integer. Can somebody please tell me the canonical way of getting the...

atom feed xmlns attribute messes up AS3's XML-parsing?

Wanna see something interesting? var xml:XML = XML(<feed><entry /><entry /><entry /></feed>); trace(xml.entry.length()) // returns 3 Makes sense, right? Now let's add this attribute... var xml:XML = XML(<feed xmlns="http://www.w3.org/2005/Atom"&gt;&lt;entry /><entry /><entry /></feed>); trace(xml.entry.length()) // returns 0 We...

On the flow C parser

I am looking for a dynamic C-based parser/framework. It must be dynamic because the EBNF is constantly changing, something like bison is not applicable in this situation. And boost::spirit is practically useless to me because it requires C++. Does anyone have an idea? ...

Read XML with multiple top-level items using Python ElementTree?

How can I read an XML file using Python ElementTree, if the XML has multiple top-level items? I have an XML file that I would like to read using Python ElementTree. Unfortunately, it has multiple top-level tags. I would wrap <doc>...</doc> around the XML, except I have to put the <doc> after the <?xml> and <!DOCTYPE> fields. But figuri...

How can I use HTML Agility Pack to retrive all the images from a website?

I just downloaded the HTMLAgilityPack and the documentation doesn't have any examples. I'm looking for a way to download all the images from a website. The address strings, not the physical image. <img src="blabalbalbal.jpeg" /> I need to pull the source of each img tag. I just want to get a feel for the library and what it can offer...

extract single string from html using ruby/mechanize (and nokogiri)

I am extracting data from a forum. My script based on is working fine. Now I need to extract date and time (21 Dec 2009, 20:39) from single post. I cannot get it work. I used FireXPath to determine the xpath. Sample code: require 'rubygems' require 'mechanize' post_agent = WWW::Mechanize.new post_page = post_agent.get('http:/...

How to write a bison file to automatically use a token enumeration list define in a C header file ?

Hi everyone, I am trying to build a parser with Bison/Yacc to be able to parse a flow of token done by another module. The token different token id are already listed in a enumeration type as follow: // C++ header file enum token_id { TokenType1 = 0x10000000, TokenType2 = 0x11000000, TokenType3 = 0x1110000...

Best XML parser for C

Hi, We have to add a new interface to our existing C application. The new interface system requests to our C application and responses to interface will be XML files. We need find a way to read and write XML files. It seems there are many mapping tools available for Java and C++. I did not find anyone for C. Please let me know if ther...

parse string with tags

Hi all, I am receiving a chunk of data from PBX in string with tags included. Something like this: </response><rid>2</rid><name>2101<name><PeerList></PeerList><status>UNKNOWN</status> cont...till it fetches all the names/users from the PBX. what i need to do is to parse this string data to retrieve name & status and update i...