parsing

multilevel parsing algorithm

for now I have a pretty simple way of parsing out some commands through actionscript. I'm using regexp to find tags, commands and operands using... +key_word+ // any text surrounded by + [ifempty +val_1+]+val_2+[/ifempty] //simple conditional [ifisnot={`true,yes`} +ShowTitle+]+val_3+[/ifisnot] // conditional with operands my curre...

HTML processing in Java: Convert HTML to other formats

OK, there are many HTML/XML parsers for Java. What I want to do is a bit more than just knowing how to parse it. I want to filter the content and have it in suitable form. More precisely, I want to keep only the text and images. However, I want to preserve some of the text formatting, too, like: italic, bold, alignment, etc. All this i...

Is it possible to have a grammar where a "keyword" can also be treated as a "non-keyword"?

I have the following grammar in ANTLRWorks 1.4. I'm playing around with ideas for implementation of a parser in a text-adventure game creator, where the user will specify the various allowable commands for his game. grammar test; parse : cmd EOF; cmd : putSyn1 gameObject inSyn1 gameObject; putSyn1 : Put | Pla...

PHP application running out of Memory

Hi Guys, I am writing a set of classes for a crawler, it crawls a start page, pulls three links based on parameters (found using Simple Html Dom Parser allowing use of jquery like selectors), crawls those pages, then goes to page 2, picks the next 3 pages. Current max pages is 57 times. Needless to say I am getting: Allowed memory si...

checking an integer to see if it contains a zero

Given an integer, how could you check if it contains a 0, using Java? 1 = Good 2 = Good ... 9 = Good 10 = BAD! 101 = BAD! 1026 = BAD! 1111 = Good How can this be done? ...

JSON parsing with Newtonsoft.JSON

I have a file in JSON format with record for individual users. Some of the users have a comment field stuck in the middle of their records. I just want to parse top-level items ( fullName contributorName email) using the Newtonsoft.JSON parser, but I can't seem to get it to recognize individual objects. When I parse the whole string int...

C - Parsing a command line with an unknown number of parameters

Possible Duplicate: Parse string into argv/argc I'm trying to write a fake shell using C, so I need to be able to take a command and then x number of arguments for the command. When I actually run the command, I'm just using execvp(), so I just need to parse the arguments into an array. But I wasn't really sure how to do this ...

Viewing of stack during yacc parsing

Is there a way to see the stack(for better understanding of working of yacc) during each step of yacc parsing. ...

Ruby/Python - generating and parsing C/C++ code

Hi, I need to generate C structs and arrays from data stored in a db table, and alternately parse similar info. I use both ruby and python for this task, and was wondering if anyone heard of a module/lib that handles this for either/both languages? I could do this on my own with some string processing, but wanted to check if there's a k...

Java: splitting up a large XML file with SAXParser

Hi All, I am trying to split a large XML file into smaller files using java's SAXParser (specifically the wikipedia dump which is about 28GB uncompressed). I have a Pagehandler class which extends DefaultHandler: private class PageHandler extends DefaultHandler { private StringBuffer text; ... @Override public void startEl...

How to simplify this method (splitting unquoted, unbracketed, unescaped commas)?

Curious if this can be simplified... internal static IEnumerable<string> Split(string str, char sep = ',') { int lastIndex = 0; bool quoted = false; bool escaped = false; bool bracketed = false; char lastQuote = '\0'; for (int i = 0; i < str.Length; ++i) { if (str[i] == '[') { if ...

How do I look into a Starcraft 2 replay?

I am interested in building a parser for my own enjoyment using PHP. What do I need to know? What suggestions would you have for me? How do I even open a Starcraft 2 replay using PHP? ...

Using C#, how do I validate a html file?

Hi, I have a C# application that receive a html file, I want to parse it and validate it, that on output it will return a list of errors or that my html is valid. Has anyone any idea how can I do this? Thanks. ...

Demarshalling XStream errors

Hi everyone, I have successfully created an XML file using XStream and a custom converter. The custom converter marshalls an object so it is nicely readable by humans as follows: <Header clip="16:01:02:22 -&gt; 16:01:52:00" match clock="961"> <match filename="MatchName" name="Jets vs Giants" date="2009-11-29 16:00:00.0"/> </Header> ...

Using integers in text-query

I am indexing a table of companies, where a lot of them have names starting with an integer, e.g: 2partner 3m etc. But when I try to do a simple solr-query like "2partner" (in Solr's webinterface), the integer "2" is removed by the query parser. Here's the debug: <lst name="debug"> <str name="rawquerystring">2partner</str> <str name="...

HTML of a parsed page

Hi, i am using Cobra parsing engine and I wish to get HTML code of an already parsed page (e.g. after javascripts executions). Is it possible to do? Cobra may be replaced with another open-source java web parser if needed. ...

How to write tag deleter script in python

I want to implement a file reader (folders and subfolders) script which detects some tags and delete those tags from the files. The files are .cpp, .h .txt and .xml And they are hundreds of files under same folder. I have no idea about python, but people told me that I can do it easily. EXAMPLE: My main folder is A: C:\A Inside A, I...

Xstream duplicate attribute problem

Hi Everyone, I've been playing about with XStream XML parsing and I have a bit of a problem. In a file I need to parse, I have a node with several arbitrary attributes of the same name. THe node is a football team and the attributes are the names of each player. <team home="Arsenal"> <squad player="Manuel Almunia Rivero" player="Abou...

Is there any XML Parser that comes by default with g++

PHP, C# is bundled with such parsers and are available with the default libraries. Does g++ have any such libraries? ...

How to create a .Net programming language?

I have created a few different full programming languages using some of the various parsing tools available. However, how would someone create a programming language that runs the .Net framework? Would I have to output the .Net IL and compile that or is there a higher level of abstraction? Also, is there an easy way to get the language ...