parsing

Checking emptiness of an element in hpricot

Let's say this is the location element: <.location>blah...<./location> It can be empty like this: <.location/> Is there a way to detect the backslash in the empty element in order to not return it? ...

emails as email.Message class objects in Python

How do I use poplib, and download mails as message instances from email.Message class from email module in Python? I am writing a program, which analyzes, all emails for specific information, storing parts of the message into a database. I can download the entire mail as text, howver walking through text searching for attachments is dif...

Coldfusion - XML formatting a string returned from API call

We call an API that returns a string of XML-formatted data. We'd like to convert this string into a ColdFusion XML object, via XMLParse(). A problem occurs when special characters show up in the data values. For example, characters like this: &nbsp; &mdash; &ndash; (yes, the raw data contains them in their HTML encoded equivalent). Whe...

Using XmlSlurper: How to select sub-elements while iterating over a GPathResult

I am writing an HTML parser, which uses TagSoup to pass a well-formed structure to XMLSlurper. Here's the generalised code: def htmlText = """ <html> <body> <div id="divId" class="divclass"> <h2>Heading 2</h2> <ol> <li><h3><a class="box" href="#href1">href1 link text</a> <span>extra stuff</span></h3><address>Here is the address<span>Te...

Randomize external RSS feed order in PHP

I'm using a third-party AJAX slideshow for a website that takes an RSS feed as its picture source. I would like to randomize the order of the pictures, but that's not a feature of the slideshow (or the RSS feed I'm pulling from). Surely it shouldn't be difficult to write a short function in PHP that takes an external RSS feed, randomiz...

flex/lex yacc/bison multithreaded environment

Can I use the code generated by flex/bison|lex/yacc in a multithreaded environment ? I'm afraid there are a lot of global variables. How can it be fixed ? ...

split select statement with regular expression

hello, i need to split any mysql select statement in its main parts: SELECT, FROM, ALL THE JOINS(if there are any), WHERE(if it exists),GROUP BY(if it exists), HAVING(if it exists), ORDER BY(if it exists), LIMIT(if it exists)... i tried using regular expressions, but i'm not very good with them... for the SELECT the regex was simple (an...

how to format atom date time

Hi i'm getting dates from feed in this format 2009-11-04T19:55:41Z i'm trying to format it using the date() function in PHP but i get an error saying: date() expects parameter 2 to be long, object given in /bla/bla.php i tried using preg_replace() to remove the T and the Z but still can't get it to work any ideas on this ? ...

Problem querying an HTML file using HTMLEditorKit in Java

My HTML contains tags of the following form: <div class="author"><a href="/user/1" title="View user profile.">Apple</a> - October 22, 2009 - 01:07</div> I'd like to extract the date, "October 22, 2009 - 01:07" in this example, from each tag I've implemented javax.swing.text.html.HTMLEditorKit.ParserCallback as follows: class HTMLP...

php parse urls with multiple variables with get request

I'm sending a php script multiple urls (about 15) at once, all containing about 5 url variables. In my script, I'm parsing the chunk of urls into individual ones by splitting them with two backslashes (which i add upon before to the script), and then curling each individual url. However, when I run my script, it only accepts a url up to ...

Parsing data (PHP, MySQL)

Hi guys! I have a file with data in the following format: <user> <fname>Anthony</fname> <lname>Smith</lname> <accid>3874918</accid> </user> <user> ... </user> I'm trying to parse this data and store it to MySQL database with the follwing fields: fname, lname, accid. Now the problem is how to determine <user> and </...

Utilities to generate an XML representation of a Java package/class?

I'm looking to generate an XML representation of the AST for a given java class (by parsing its source). Overall, what I want to do is write XSLT queries to find meta patterns in the source code - very much like PMD does. There was an open source utility that started this, but went stale. Anyone know of a utility to do this? -Mike ...

Parse HTML Page For Links With Regex Using Perl

Possible Duplicate: How can I remove external links from HTML using Perl? Alright, i'm working on a job for a client right now who just switched up his language choice to Perl. I'm not the best in Perl, but i've done stuff like this before with it albeit a while ago. There are lots of links like this: <a href="/en/subtitles/35...

Find Text Between 2 Quotes with jQuery

ok, so I have this small block of text: function onfocus(event) { if ($(this).val() == "Some Arbitrary Text") {$(this).val("");} } Using jQuery or JavaScript, I would like to find teh "Arbitrary Text". This text block is constant, with the exception of the "Arbitrary Text". Ideally, I would like a way to parse it without using com...

BioPython: extracting sequence IDs from a Blast output file

Hi, I have a BLAST output file in XML format. It is 22 query sequences with 50 hits reported from each sequence. And I want to extract all the 50x22 hits. This is the code I currently have, but it only extracts the 50 hits from the first query. from Bio.Blast import NCBIXM blast_records = NCBIXML.parse(result_handle) blast_record = bl...

Text processing / comparison engine

Hi, I'm looking to compare two documents to determine what percentage of their text matches based on keywords. To do this I could easily chop them into a set word of sanitised words and compare, but I would like something a bit smarter, something that can match words based on their root, ie. even if their tense or plurality is differen...

Can anyone tell me in detail about xmlHashScan function in the libxml2 library?

Can anyone tell me in detail about xmlHashScan function in the libxml2 library? ...

PHP DomDocument XML Load with Broken XML Data

Hi, How do you deal with broken data in XML files? For example, if I had <text>Some &improper; text here.</text> I'm trying to do: $doc = new DOMDocument(); $doc->validateOnParse = false; $doc->formatOutput = false; $doc->load(...xml'); and it fails miserably, because there's an unknown entity. Note, I can't use CDATA due to t...

parse XML string .net

I have the following XML string that I need to parse(break). Any body know what is the code to be used? I can provide my code if need it. <?xml version='1.0' encoding='ISO-8859-1'?> <SystemGenerator-Document> <TN>42</TN> <OC>CR</OC> <HN>738</HN> <USERID>xxx</USERID> <WS>FACTORY</WS> <OBJID>254209</OBJID> <Sys...

XML Parse .net c#

Hi, I am new to XML I am receiving the following file/string. How can I break it in C# so I can put each of the fields in my SQL server Database? BTW I don't know how to format XML in StackOverflow if somebody can tell me how to do it. I'll do it. <?xml version='1.0' encoding='ISO-8859-1'?> <SystemGenerator-Document> <TN>42</TN> ...