parsing

Fastest way to get YouTube videos for over 100.000 songs

What is the fastest way to get YouTube videos for over 100.000 songs? I'm currently using PHP and SimpleXML to parse YouTube's feed, but it seems to be pretty slow. Any other ideas? ...

How to find Title case phrases from a passage or bunch of paragraphs

How do I parse sentence case phrases from a passage. For example from this passage Conan Doyle said that the character of Holmes was inspired by Dr. Joseph Bell, for whom Doyle had worked as a clerk at the Edinburgh Royal Infirmary. Like Holmes, Bell was noted for drawing large conclusions from the smallest observations.[1] Michael Har...

BigInteger.valueOf() for very big numbers?

What would be the best way to convert a 50-digit String to a BigInteger in Java? It doesn't have a valueOf(String) method, and I can't convert to Long because it's too small. ...

Parsing a XML File and Replacing Chosen Node With Values From Text File

I wrote a C# Winforms program to take a XML file and load values from a text file into each occurrence of that field which the user specifies on the UI. For whatever reason, the program inserts a carriage return on any nodes which don't contain a value. For example, it will do that to <example></example> whereas it will not misbehave on ...

ready made parser in java

i have some user defined tag. for example data here , jssj .I have a file(not xml) which contains some data embeded in tags.I need a parser for this which will identify my tags and will extract the data in proper format. Eg <newpage> thix text </newpage> <tagD> <tagA> kk</tagA> </tagD> tags can also have some attributes as simlar...

command line arg parsing through introspection

I'm developing a management script that does a fairly large amount of work via a plethora of command-line options. The first few iterations of the script have used optparse to collect user input and then just run down the page, testing the value of each option in the appropriate order, and doing the action if necessary. This has resulted...

Hints on parsing

I want to implement a minimal templating language like Template Toolkit but much more simple. I don't want to use an existing implementation/library, but start from scratch because I want to learn something from it and I want to completely understand it in order to adopt it to my needs. The end product should be in C but I will probably ...

how to parse xml document in java using SAX approach

i want to parse a file which is similar to a HTML file . Its not exactly a html file.It can contain some user defined tags. I dont know in advance how the tags are nested in one another in advance.The tags may also have attributes. I think i shold use a SAX parser. Does java have a inbuilt SAX . Can i call a function when i encounter eac...

Scala Parser Issues

Hi I am having issues testing out the Scala Parser Combinator functionality for a simple Book DSL. Firstly there is a book class: case class Book (name:String,isbn:String) { def getNiceName():String = name+" : "+isbn } Next, there is the simple parser: object BookParser extends StandardTokenParsers { lexical.reserved += ("book","...

How big is the speed difference between XPathNavigator and XmlReader, really?

I've got a fairly big XML file that I need to parse into a .NET class structure (to be mapped to a fixed-length record format and transmitted via MQ). Performance is important, but not absolutely critical. I almost always use XPathNavigator to read XML files because it's much easier than XmlReader. On the other hand, I know XmlReader i...

Parse an HTTP request Authorization header with Python

I need to take a header like this: Authorization: Digest qop="chap", realm="[email protected]", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41" And parse it into this using Python: {'protocol':'Digest', 'qop':'chap', 'realm':'[email protected]', '...

Write a C# script to test hundreds of domain names

A client has given me a spreadsheet of hundreds of domain names. My task is to determine the following about each: Which domains are connected to a web server / website. Of those that are, which redirect to another site. What is the server software running (ASP, ASP.NET, Apache, etc) ...and output the results in an organized fashion...

parsing a non xml file in java

I want to parse a document that is not pure xml. For example my name is <j> <b> mike</b> </j> example 2 my name is <mytag1 attribute="val" >mike</mytag1> and yours is <mytag2> john</mytag2> Means my input is not pure xml. ITs simliar to html but the tags are not html. How can i parse it in java? ...

.Net WikiText to HTML Parser

I know, I know, its sounds silly, but it seems that there are no opensource robust .NET libraries out there for parsing Wikitext to HTML. Anybody know of a stable.robust .net Wikitext to HTML parser (i.e. codeplex projects that are still in beta mode do not count) ...

DOM parser for non xml

i want to parse the following type of text. Example1 <root>my name is <j> <b> mike</b> </j> </root> example 2 <root> my name is <mytag1 attribute="val" >mike</mytag1> and yours is <mytag2> john</mytag2> </root> can i parse it using a DOM parser?I will not have the same format evry time .I can have different formats in which the ...

C#: How do I parse a string with a decimal point to a double?

I guess this is a very easy question, but I wasn't able to find a question with Google or the MSDN examples. I want to parse a string like "3.5" to a double. However, double.Parse("3.5") yields 35 and double.Parse("3.5", System.Globalization.NumberStyles.AllowDecimalPoint) (something I tried in my desperation ;) ) which compiles...

SAX parser in java

can SAX parser handle self closing tag? or will it cause an error? eg <br/> or <hr/> ...

Will Interning strings help performance in a parser?

If you are parsing, lets just say HTML, once you read the element name, will it be beneficial to intern it? The logic here is that this parser will parse the same strings (element names) over and over again? And several documents will be parsed. Theory: // elemName is checked for null. MarkupNode node = new MarkupNode() { Name = St...

Parsing unstructured documents into XML

I am parsing unstructured documents into a structured representation (XML) using a template to describe the intended result. A simple typical problem might be a list of strings: "Chapter 1" "Section background" "this is something" "this is another" "Section methods" "take some xxx" "do yyy" "and some..." "Chapter apparatus" "we created....

[Ruby] open-uri + hpricot & nokogiri don't parse html correctly

I'm trying to parse a webpage using open-uri + hpricot but it seems to be a problem in the parsing proccess as the gems don't bring me the things I want. Specifically I want to get this div (whose id is 'pasajes') in this url: http://www.despegar.com.ar I write this code: require 'nokogiri' require 'hpricot' require 'open-uri' docu...