parsing

Tool to parse text for possible Wikipedia links.

Does a tool exist that can parse text and output that text, hyper-linked to Wikipedia entries for words of interest? For example, I'd like a tool that could turn something like: The most popular search algorithm on a sorted list is the binary search. Into: The most popular search algorithm on a sorted list is the binary se...

Option Parsers for c/c++?

I've done some looking and there are a whole lot of libraries for command line option parsing, but it is difficult to differentiate between them. Does anyone have any experience with any of them? Is one harder/better/faster/easier/whatever than any of the others? Or should I just grow my own? ...

Regex to get all possible matches for a pattern in C#

I'm learning regex and need some help to get all possible matches for a pattern out of a string. If my input is: case a when cond1 then stmt1; when cond2 then stmt2; end case; I need to get the matches which have groups as follows Group1: "cond1" "stmt1;" and Group2: "cond2" "stmt2;" Is it possible to get such groups using...

Parsing scientific notation sensibly?

I want to be able to write a function which receives a number in scientific notation as a string and splits out of it the coefficient and the exponent as separate items. I could just use a regular expression, but the incoming number may not be normalised and I'd prefer to be able to normalise and then break the parts out. A colleague ha...

parse error

I need a fresh set of eyes on this. I got this code from someone who said it worked. Here is the error: PHP Parse error: parse error, expecting `T_STRING' or `T_VARIABLE' or `T_NUM_STRING' in C:\Inetpub\wwwroot\2am\employment\send-email-form.php on line 19 Line 19 is the first line of GetUploadedFileInfo(). I get the error for line...

Parse text and return similarities

Let's say I have several URLs and I return the basename from each URL, like so; http://www.test.com/the.code.r00 would return the.code.r00 and I have several basenames I extracted from several URLs to work on the.code.r00 the.code.r01 .. ... the.code.r12 and together with those I have the following basenames too from other URLs ...

C# regex to match a string which has a delimiter

I need some help with regex. i want to get the string which has a delimiter in it between two specific words. e.g. i need a regex which matches: Statements1 start Statements2 ; Statements3 end fun; There can be multiple occurences of ' ; ' between 'start' and 'end'. Statements are multiple words where (.*) can be used in the regex f...

Are there any Java Frameworks for binary file parsing?

My problem is, that I want to parse binary files of different types with a generic parser which is implemented in JAVA. Maybe describing the file format with a configuration file which is read by the parser or creating Java classes which parse the files according to some sort of parsing rules. I have searched quite a bit on the internet...

Have you ever effectively used lexer/parser in real world application?

Recently, I am started learning Antlr. And knew that lexer/parser together could be used in construction of programming languages. Other than DSL & programming languages, Have you ever directly or in-directly used lexer/parser tools (and knowledge) to solve real world problem? is it possible to solve the same problem by an average progr...

Grammar for Arithmetic Expressions

I was assigned a task for creating a parser for Arithmetic Expressions (with parenthesis and unary operators). So I just wanna know if this grammar correct or not and is it in LL(1) form and having real problems constructing the parse table for this S -> TS' S' -> +TS' | -TS' | epsilon T -> UT' T' -> *UT' | /UT' | epsilon U -> VX...

First & Follow set for Arithmetic Expressions

I want to know if my FIRST and FOLLOW set I made for this grammar is correct or not S -> TS' S' -> +TS' | -TS' | epsilon T -> UT' T' -> *UT' | /UT' | epsilon U -> VX X -> ^U | epsilon V -> (W) | -W | W | epsilon W -> S | number FIRST(S) = FIRST(T) = FIRST(U) = FIRST(V) = FIRST(W) = { ( , - , + , number , epsilon } FIRST(T') = {...

Merge Two XML Files in Java

Hi All I have two XML files of similar structure which I wish to merge into one file. Currently I am using EL4J XML Merge which I came across in this tutorial. However it does not merge as I expect it to for instances the main problem is its not merging the from both files into one element aka one that contains 1, 2, 3 and 4. Instea...

Parsing basic math equations for children's educational software?

Inspired by a recent TED talk, I want to write a small piece of educational software. The researcher created little miniature computers in the shape of blocks called "Siftables". [David Merril, inventor - with Siftables in the background.] There were many applications he used the blocks in but my favorite was when each block was a n...

How to split a text file into words?

Hello, I am working on a assignment where I am supposed to read a file and count the number of lines and at the same time count the words in it. I tried a combination of getline and strtok inside a while loop, which did not work. file:example.txt (the file to be read). Hi, hello what a pleasant surprise. Welcome to this place. M...

XML parsing in Javascript

I have a variable string that contains well-formed and valid XML. I need to use Javascript to parse this feed. How can I accomplish this using (browser-compatible) Javascript? ...

Regular expression to categorize the parts of a service address?

The app I am writing deals with utility service addresses, and right now I am forcing the user to know enough to separate the parts of the address and put them in the appropriate fields before adding to the database. It has to be done this way for sorting purposes because a straight alphabetical sort isn't always right when there is a p...

I need to parse an HTML formatted country list into SQL inserts. Is there an easier way to do this?

There is about 2000 lines of this, so manually would probably take more work than to figure out a way to do ths programatically. It only needs to work once so I'm not concerned with performance or anything. <tr><td>Canada (CA)</td><td>Alberta (AB)</td></tr> <tr><td>Canada (CA)</td><td>British Columbia (BC)</td></tr> <tr><td>Canada (CA)<...

Parsing XML with CDATA with JQuery

Edit: I was missing two things here. The lack of "Content-Type:text/xml" in the header returned by the AJAX call was preventing JQuery from treating the returned data as a document. Once that was handled correctly, this code parsed correctly and output just the index and project name. $("a.getprojects").click(function(d){ d.preventD...

combine regex in ruby

Given this text: /* F004 (0309)00 */ /* field 1 */ /* field 2 */ /* F004 (0409)00 */ /* field 1 */ /* field 2 */ how do I parse it into this array: [ ["F004"],["0309"],["/* field 1 */\n/* field 2 */"], ["F004"],["0409"],["/* field 1 */\n/* field 2 */"] ] I got code working to parse the fir...

C# Regex to get the comments block out of pl/sql code

i want to extract the comments out of a string as a block. e.g. I have a PL/SQL code as: --comment1 select * from t_table; --i want comment 2; /*i want comment 3 */ --i want comment 4 OPEN data_cur; Here, i want all the single line and multiline comments before OPEN data_cur; but after select * from t_table; i.e. i want a full co...