parsing

How to parse javascript for links with java?

I'm writing a program (in Java) that needs to extract links from webpages. I'm using htmlParser (http://htmlparser.sourceforge.net/) but I'm only able to extract html links (defined with <a href="...">) and I don't know how to handle javascript code to extract links from... can you help me?? ...

example for using streamhtmlparser

Can anyone give me an example on how to use http://code.google.com/p/streamhtmlparser to parse out all the A tag href's from an html document? (either C++ code or python code is ok, but I would prefer an example using the python bindings) I can see how it works in the python tests, but they expect special tokens already in the html at w...

Multi-purpose Parser.

Im thinking of implementing a parser framework that would utilize a set of interfaces to make it easy to adapt to different types of data formats. I want to create structure around the way my controller object interacts with this parser and have come up with the following simple structure. I was hoping the community could provide any com...

Ping servers and check the result from C program?

Hi This is My last question. Now my new requirement is to ping some set of servers and check if they are replying or not. I am trying my way of system("ping xxx.xx.xx.xx >out.txt"); And then parsing the out.txt for a string "Request timed out.". This is yielding me good results. But is there any better way to do from c program. Non p...

Parse and query SOAP in C#

I am trying to parse a heavily namespaced SOAP message (source can be found also here): <?xml version="1.0" encoding="UTF-8"?> <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"&gt; <soapenv:Header> <ns1:Transac...

Antlr tree rewrite rules.

I'm trying to parse an expression like a IN [3 .. 5[, where the direction of the angle brackets determine whether the interval is inclusive or exclusive. I want this to be rewritten to an AST like NODE-TYPE | +------------+-----------+ | | | variable lower-bound upper-bound ...

How can I merge CSS definitions in files into inline style attributes, using Perl?

Many email clients don't like linked CSS stylesheets, or even the embedded <style> tag, but rather want the CSS to appear inline as style attributes on all your markup. BAD: <link rel=stylesheet type="text/css" href="/style.css"> BAD: <style type="text/css">...</style> WORKS: <h1 style="margin: 0">...</h1> However this inline style a...

parsing CSV files backwards

I have csv files with the following format: CSV FILE "a" , "b" , "c" , "d" hello, world , 1 , 2 , 3 1,2,3,4,5,6,7 , 2 , 456 , 87 h,1231232,3 , 3 , 45 , 44 The problem is that the first field has commas "," in it. I have no control over file generation, as that's the format I receive them i...

Generating JSON Object

I'm trying to parse the rows in a table that I generate using javascript by adding items to a cart and then create a json object when the user hits save order of all the items and pass it to a php script using $.post in jQuery. The only trouble I'm having is understanding JSON objects and how to push more items onto the object. I get an...

What Java regular expression do I need to match this text?

Hi, I'm trying to match the following using a regular expression in Java - I have some data separated by the two characters 'ZZ'. Each record starts with 'ZZ' and finishes with 'ZZ' - I want to match a record with no ending 'ZZ' for example, I want to match the trailing 'ZZanychars' below (Note: the *'s are not included in the string - ...

PHP Parse Date String

If I've got a date string: $date = "08/20/2009"; And I want to separate each part of the date: $m = "08"; $d = "20"; $y = "2009"; How would I do so? Is there a special date function I should be using? Or something else? Thanks! ...

Can Perl be "statically" parsed?

An article called "Perl cannot be parsed, a formal proof" is doing the rounds. So, does Perl decide the meaning of its parsed code at "run-time" or "compile-time"? In some discussions I've read, I get the impression the arguments stem from imprecise terminology, so please try to define your technical terms in your answer. I have deliber...

Handling error conditions in Lex rather than Yacc?

Suppose I have a lex regular expression like [aA][0-9]{2,2}[pP][sS][nN]? { return TOKEN; } If a user enters A75PsN A75PS It will match But if a user says something like A75PKN I would like it to error and say "Character K not recognized, expecting S" What I am doing right now is just writing it like let [a-zA-Z] num [0-9] {l...

.NET Html Parser

This must be the 20th duplicate or so, here is one: Looking for C# HTML parser I'm looking for an open source, fast, w3c-equivalent html/xhtml parser for C# without native dlls. Thanks. ...

Extracting link display text as well as href attribute with PHP 5

$oldSetting = libxml_use_internal_errors( true ); libxml_clear_errors(); I have seen many examples on the web on how to extract the URLs from HTML with PHP 5's DOM functions, but I need to get the link text as well as the link. If I use the code below to extract the link "http//X.com" from the "href" attribute in the anchor tag YYYYY, h...

What is the most efficient data structure to hold keywords?

I decided to write a small parser to parse BBCode and return properly formatted HTML. I am having a hard time deciding what the most efficient way to represent the keywords would be. I could always use separate strings to hold them, but I feel like there must be some unknown data structure (to me) that would allow for efficient lookup. ...

PHP Parse error: syntax error, unexpected T_STRING, expecting T_FUNCTION

PHP Parse error: syntax error, unexpected T_STRING, expecting T_FUNCTION in C:\Inetpub\wwwroot\webroot\www.novotempo.org.br\lib\Twitter.php on line 54 Hi, I´m Douglas from Brazil, and this above is my problem. The line is just a DEFINE.... this one : define('DEBUG',false); Searching the net I found that this usually occurs when yo...

Funny CSV format help

I've been given a large file with a funny CSV format to parse into a database. The separator character is a semicolon (;). If one of the fields contains a semicolon it is "escaped" by wrapping it in doublequotes, like this ";". I have been assured that there will never be two adjacent fields with trailing/ leading doublequotes, so this...

trying to parse weird formatted xml in php

I am trying to chop XML data into usable strings to reuse them later on in my script. I am receiving the data via a Curl request and his goes great. now chopping the data kills me.. this a part of the XML I am receiving (the whole data part is about 90 lines) <professions> <skill key="IT Specialist" maxage="40" group="IT" worked="...

LaTeX Failed to Parse(Unknown Error) on MediaWiki

On my wiki implemented by the MediaWiki interface, I am receiving a Failed to Parse (Unknown Error) for the LaTeX in the page. I checked the LocalSettings.php file, and I have set the proper variable($wgUseTeX) to true. If it helps, the error message before this was a Failed to Parse(Missing texvc executable), but I "fixed" it to the be...