Hi guys,
is there any simple way to create sentence parser in plain Java
without adding any libs and jars.
Parser should not just take care about blanks between words,
but be more smart and parse: . ! ?,
recognize when sentence is ended etc.
After parsing, only real words could be all stored in db or file, not any special chars.
tha...
I am working with a XML driven CMS, and before I run off and either write or implement a module that parses the iCal format, I was wondering if there was any way to parse it using just XSLT or ideally just an XPath expression, as this is a built in function of the CMS.
...
i have to parse a page in php,the url of the page is going on 302 Moved temporarily header and is moved to a not found page.Its data can be retrieved manually through console option of firebug add on of mozilla.But if i try to parse it using php it gives me that not found page in return.How can i parse that page please suggest??
edit:
i...
Hello,
I need to implement a simple over-the-network interaction in C++ and I've been wondering whether there are libraries that do that already. My protocol basically sends messages and receives responses. Each message is just a set of 3-4 values of basic data types. I would like to find a library (or libraries) that can do one or more...
I am about to finish my script that parses/scrapes website using mechanize&ruby.
I need to port my script to PHP in the future.
My question is
if there is any library available for both ruby and php or
if anybody can recommend any other approach to this?
...
Does anyone know of a library - ideally Python, that can have a stab at pulling dates out of text?
"Shall we go to the library today" -> 21 Jan 10
"Starting on the 1st of January" -> 1 Jan 10
"Anytime between 3nd and 5th of Feb 2009" -> 3 Feb 09, 5 Feb 09
It's a tough problem and probably why I havn't found anything!
Already using N...
Hi there,
I'm trying to parse a XML file. I'm able to parse normal text node but how do I parse a textlist? I'm getting the firstChild of the textlist thats sadly all. If I try to do
elem.nextSibling();
it is always null which can't be, I know there are two other values left.
Does someone can provide me an example maybe?
Thanks!
...
I inherited a table with identifiers in a format [nonnumericprefix][number]. For example (ABC123; R2D2456778; etc). I was wondering if there was a good way to split this in SQL into two fields, the largest integer formed from the right side, and the prefix, for example (ABC, 123; R2D, 2456778; etc). I know I can do this with a cursor,...
I'm new to EDI, and I have a question.
I have read that you can get most of what you need about an EDI format by looking at the last 3 characters of the ISA line. This is fine if every EDI used line breaks to separate entities, but I have found that many are single line files with any number of characters used as breaks. I have notice...
I am having trouble in Tcl using numbers with leading zeros. I am parsing some numbers that can have leading zeros, such as "0012", which should be interpreted as the integer "twelve".
$ tclsh
% set a 8
8
% set b 08
08
% expr $a - 1
7
% expr $b - 1
expected integer but got "08" (looks like invalid octal number)
What is the best way ...
Hello,
I've read a few questions on here re parsing HTML with regex, and I understand that this is, on the whole, a terrible idea.
Having said this, I have a very specific problem that I think Regex might be the answer to. I've been fumbling around trying to work out the answer but I'm new (today) to Regex, and I was hoping some kind ...
I have a bunch of numbers represented as hexadecimal strings in logfiles that are being parsed by a Perl script, and I'm relatively inexperienced with Perl. Some of these numbers are actually signed negative numbers, ie 0xFFFE == -1 when represented as a 16-bit signed integer. Can somebody please tell me the canonical way of getting the...
Wanna see something interesting?
var xml:XML = XML(<feed><entry /><entry /><entry /></feed>);
trace(xml.entry.length()) // returns 3
Makes sense, right? Now let's add this attribute...
var xml:XML = XML(<feed xmlns="http://www.w3.org/2005/Atom"><entry /><entry /><entry /></feed>);
trace(xml.entry.length()) // returns 0
We...
I am looking for a dynamic C-based parser/framework.
It must be dynamic because the EBNF is constantly changing, something like bison is not applicable in this situation. And boost::spirit is practically useless to me because it requires C++.
Does anyone have an idea?
...
How can I read an XML file using Python ElementTree, if the XML has multiple top-level items?
I have an XML file that I would like to read using Python ElementTree.
Unfortunately, it has multiple top-level tags. I would wrap <doc>...</doc> around the XML, except I have to put the <doc> after the <?xml> and <!DOCTYPE> fields. But figuri...
I just downloaded the HTMLAgilityPack and the documentation doesn't have any examples.
I'm looking for a way to download all the images from a website. The address strings, not the physical image.
<img src="blabalbalbal.jpeg" />
I need to pull the source of each img tag. I just want to get a feel for the library and what it can offer...
I am extracting data from a forum. My script based on is working fine. Now I need to extract date and time (21 Dec 2009, 20:39) from single post. I cannot get it work. I used FireXPath to determine the xpath.
Sample code:
require 'rubygems'
require 'mechanize'
post_agent = WWW::Mechanize.new
post_page = post_agent.get('http:/...
Hi everyone,
I am trying to build a parser with Bison/Yacc to be able to parse a flow of token done by another module. The token different token id are already listed in a enumeration type as follow:
// C++ header file
enum token_id {
TokenType1 = 0x10000000,
TokenType2 = 0x11000000,
TokenType3 = 0x1110000...
Hi,
We have to add a new interface to our existing C application. The new interface system requests to our C application and responses to interface will be XML files. We need find a way to read and write XML files. It seems there are many mapping tools available for Java and C++. I did not find anyone for C.
Please let me know if ther...
Hi all,
I am receiving a chunk of data from PBX in string with tags included.
Something like this:
</response><rid>2</rid><name>2101<name><PeerList></PeerList><status>UNKNOWN</status>
cont...till it fetches all the names/users from the PBX.
what i need to do is to parse this string data to retrieve name & status and update i...