i can succesfully run crawl command via cygwin on windows xp. and i can also make web search via using tomcat.
but i also want to save parsed pages during crawling event
so when i start crawling with like this
bin/nutch crawl urls -dir crawled -depth 3
i also want save parsed html files to text files
i mean during this period which ...
I am attempting to parse, not evaluate, rails ERB files in a Hpricot/Nokogiri type manner. The files I am attempting to parse contain HTML fragments intermixed with dynamic content generated using ERB (standard rails view files) I am looking for a library that will not only parse the surrounding content, much the way that Hpricot or No...
I want to parse the Yahoo! Weather API and I want to save these element attributes to variables for later use:
<yweather:location city="Zebulon" region="NC" country="US"/>
<yweather:astronomy sunrise="6:52 am" sunset="7:39 pm"/>
<yweather:forecast day="Wed" date="7 Apr 2010" low="61" high="96" text="Partly Cloudy" code="29" />
H...
I would like to write a (HTML) parser based on state machine but I have doubts how to acctually read/use an input. I decided to load the whole input into one string and then work with it as with an array and hold its index as current parsing position.
There would be no problems with single-byte encoding, but in multi-byte encoding each ...
Hello,
i want to parse Tags from a mixed Content String. The string goes like this:
"<PERSON>yasir arafat</PERSON> , the president of the <LOCATION>palestinian authority</LOCATION> , on the defensive , mr . sharon believes , a government official"
I only want to use jaxp. Got anybody an idea for this. May an easy way with Expressions....
I have an array of json objects like so:
[{"a":"b"},{"c":"d"},{"e":"f"}]
What is the best way to turn this into a php array?
json_decode will not handle the array part and returns NULL for this string.
...
For example, these are valid math expressions:
a * b + c
-a * (b / 1.50)
(apple + (-0.5)) * (boy - 1)
And these are invalid math expressions:
--a *+ b @ 1.5.0 // two consecutive signs, two consecutive operators, invalid operator, invalid number
-a * b + 1) // unmatched parentheses
a) * (b + c) / (d // unmatched parentheses
I hav...
Are there any libraries to parse Textile (Textile to HTML) which will work in an Objective C iPhone app? C libraries will work too.
Update: I couldn't find any sufficiently developed libraries in C/Obj-C, but I did find one written in Javascript, which I used through an invisible UIWebView.
Link: Javascript textile parser
...
I'm working on one of those projects where there are a million better ways to accomplish what I need but I have no choice and I have to do it this way. Here it is:
There is a web form, when the user fills it out and hits a submit a human readable text file is created using the form data. It looks like this:
field_1: value for field one...
I'm using c#, and if I do
DateTime.ParseExact("4/4/2010 4:20:00 PM", "M'/'d'/'yyyy H':'mm':'ss' 'tt", null)
The return value is always 4:20 AM -- what am I doing wrong with using tt?
Thanks!
...
I have been trying to parse the below XML file (kuler rss feed). I have read the various posts on this site but am unable to piece them together.
I specifically want to extract the child(or siblings) nodes of the element <kuler:themeItem>.
However I am getting an exception :
Namespace Manager or XsltContext needed. This query has a prefi...
I am using IXMLDOMNodeListPtr , IXMLDOMNodePtr , IXMLDOMElementPtr and IXMLDOMDocPtr. I am having little confusion over here i.e. Should i have to call Release() on these pointers before they go out of scope.
Thanks.
...
I have some bison grammar:
input: /* empty */
| input command
;
command:
builtin
| external
;
builtin:
CD { printf("Changing to home directory...\n"); }
| CD WORD { printf("Changing to directory %s\n", $2); }
;
I'm wondering how I get Bison to not accept (YYACCEPT?) something as a command until...
Hello to all, I'm doing a html text feature extractor in C++; the program need to be REALLY fast: i need to extract a this features in ms per html page and the memory usage needs to be good and finally unicode encoding well be nice.
I know how difficult is to have all of this things, but i want a parser close to these things at least.
...
I've read the other posts here about this topic, but I can't seems to get what I want.
This is the original HTML:
<div class="add-to-cart"><form class=" ajax-cart-form ajax-cart-form-kit" id="uc-product-add-to-cart-form-20" method="post" accept-charset="UTF-8" action="/product/rainbox-river-lodge-guides-salomon-selection">
<div><div cl...
I'm using java.text.SimpleDateFormat to parse string representations of date/time values inside an XML document. I'm seeing all times that have an hour value of 12 shifted by 12 hours into the future, i. e. 20 minutes past noon gets parsed to mean 20 minutes past midnight the following day.
I wrote a unit test which seems to confirm tha...
The SimpleDateFormat:
SimpleDateFormat pdf = new SimpleDateFormat("MM dd yyyy hh:mm:ss:SSSaa");
The exception thrown by pdf.parse("Mar 30 2010 5:27:40:140PM");:
java.text.ParseException: Unparseable date: "Mar 30 2010 5:27:40:140PM"
Any ideas?
Edit: thanks for the fast answers. You were all correct, I just missed that one key se...
i have a string that looks like this -
"1AL||9CA||34CO||196WY||..."
i want to use a for loop or while loop, in which if i have an integer, it should parse this string and delete the value matching that integer.
example for above string
string = "1AL||9CA||34CO||196WY||..."
integer = 34
for
...
loop
new string = "1AL||9CA||196WY||......
I have a simple ANTLR grammar, which I have stripped down to its bare essentials to demonstrate this problem I'm having. I am using ANTLRworks 1.3.1.
grammar sample;
assignment : IDENT ':=' NUM ';' ;
IDENT : ('a'..'z')+ ;
NUM : ('0'..'9')+ ;
WS : (' '|'\n'|'\t'|'\r')+ {$channel=HIDDEN;} ;
Obviously, thi...
Hi,
I have an array gathered by componentsSeparatedByString: that looks like the following when I use po in the GDB after the array has gone through componentsSeparatedByString:
"\n\t\t <b>Suburb,
</b> BAIRNSDALE",
"\n\t\t <b>Address,
</b> 15K NW BAIRNSDALE",
"\n\t\t <b>Reference,
</b> MELWOOD/SCHO...