parsing

Parsing text into sentences?

I am trying to parse text off of a PDF page into sentences but it is much more difficult than I had anticipated. There are a whole lot of special cases to consider such as initials, decimals, quotations, etc which contain periods but do not necessarily end the sentence. I was curious if anyone here was familiar with an NLP library for ...

How to write own Configformat

Hi there, I've developed an own file format for configuration files (plaintext and line based -> EOL = one configuration) for an application. This format is nothing quit special and the only reason I do this, is to learn something! The reader and writer functions will be implemented in C (with GLib because it should be a UTF8 encoded fi...

JSON conversion in javascript

I'm trying to stringify a multi-array variable into a JSON string in Javascript. The //i'm using functions from http://www.json.org/json2.js var info = new Array(max); for (var i=0; i<max; i++) { var coordinate = [25 , 32]; info[i] = coordinate; } var result = JSON.stringify(info); But result doesn't look like a JSON string at a...

Parsing SQL text

Does anyone know how to parse SQL Text with VB.NET? Ex: I got a sql file "CREATE TABLE..." i want to get an array of columns and an array of data types. ...

How would I go about parsing the following log?

I need to parse a log in the following format: ===== Item 5483/14800 ===== This is the item title Info: some note ===== Item 5483/14800 (Update 1/3) ===== This is the item title Info: some other note ===== Item 5483/14800 (Update 2/3) ===== This is the item title Info: some more notes ===== Item 5483/14800 (Update 3/3) ===== This is th...

Is it more efficient to parse external XML or to hit the database?

I was wondering when dealing with a web service API that returns XML, whether it's better (faster) to just call the external service each time and parse the XML (using ElementTree) for display on your site or to save the records into the database (after parsing it once or however many times you need to each day) and make database calls i...

How to test a program processing large amounts of data stored in an unpredictable format

What I have to do I'm trying to manipulate some rather large amounts of data stored in Excel files (one of the workbooks has as much as 150 spreadsheets). The result of these manipulations may yield approximately 800.000 rows in a database table. The problem Data stored in the spreadsheets has unpredictable format. The company that ge...

How can I parse and normalize HTML from different HTML generators?

This is an extension of this question. I'm trying to parse HTML snippets embedded in an XML backup of a Blogger blog and retag them with InDesign tags. Blogger doesn't standardize the HTML for any of its posts, and the posts can be written in Word, Windows Live Writer, the native Blogger interface, or text editors, resulting in tons of ...

Something like TryParse for dates/times in c++ (non-Windows)?

Does anyone know of a library that offers something similar to .NET's Parse/TryParse for dates and times that can be used on Linux from C++? I've looked at the Boost date/time code but I'm not sure that I can do it without specifying the particular input format before attempting to parse. Basically, I might have dates in any number o...

Jquery XML parsing

I have an xml like this <resultGroups> <subGroups> <results> </results> <results> </results> </subGroups> <subGroups> <results> </results> <results> </results> </subGroups> <name> </name> </resultGroups> <resultGroups> <subGroups> <results> </results> <results...

C++ - How can I extract a valid string within a string?

The Problem: I am trying to extract a valid game mode for Defense of the Ancients (DotA) from a game name using C++. Details: Game names can be, at most, 31 characters long There are three game mode categories: primary, secondary, and miscellaneous There can only be 1 primary game mode selected Certain primary game modes are incompat...

Is there any tool that can generate 'sample' using the grammar definition?

I would like to test the syntax highlighting rules, and for that purpose I would like to have a sample data that is generated basing on the formal grammar. Is there any tool that allows to generate either a random sample for the grammar or the full grammar sample (as an example - generate all the possible SELECT clauses with the 'valid' ...

Is there anything like hpricot or beautiful soup for php?

Possible Duplicate: Robust, Mature HTML Parser for PHP I am looking for a good way to parse and modify html documents server side in php. Beautiful soup and hpricot look like very good tools but they are not available for php. Are there any good libraries that can do this in php? Tidy appears to be partially what I am looking fo...

Parse error: syntax error, unexpected '<' in - Fix?

Newb here trying to fix my php code. Getting an error at line 89. <?php /** * @version $Id: index.php 10381 2008-06-01 03:35:53Z pasamio $ * @package Joomla * @copyright Copyright (C) 2005 - 2008 Open Source Matters. All rights reserved. * @license GNU/GPL, see LICENSE.php * Joomla! is free software. This version may have been ...

Splitting /proc/cmdline arguments with spaces

Most scripts that parse /proc/cmdline break it up into words and then filter out arguments with a case statement, example: CMDLINE="quiet union=aufs wlan=FOO" for x in $CMDLINE do »···case $x in »···»···wlan=*) »···»···echo "${x//wlan=}" »···»···;; »···esac done The problem is when the WLAN ESSID has spaces. Users expect to set wlan='...

javacc parseException... lookahead problem?

I'm writing a parser for a very simple grammar in javacc. It's beginning to come together but at the moment I'm completely stuck on this error: ParseException: Encountered "" at line 4, column 15. Was expecting one of: The line of input in question is z = y + z + 5 and the production that is giving me problems is my expression w...

mp3 length in milliseconds

I need a script or cmd line tool get an mp3 length in milliseconds. The files are 64 kbits mono cbr encoded with lame. (I looked for a libmad for ruby, my language of choice, but found nothing noteworthy...) ...

How can I parse a C header file with Perl?

Hi, I have a header file in which there is a large struct. I need to read this structure using some program and make some operations on each member of the structure and write them back. For example I have some structure like const BYTE Some_Idx[] = { 4,7,10,15,17,19,24,29, 31,32,35,45,49,51,52,54, 55,58,60,64,65,66,67,69, 70,72,76,7...

Parsing an existing config file

I have a config file that is in the following form: protocol sample_thread { { AUTOSTART 0 } { BITMAP thread.gif } { COORDS {0 0} } { DATAFORMAT { { TYPE hl7 } { PREPROCS { { ARGS {{}} } { PROCS sample_proc } } } } } } The real file may not have these exact fields, a...

Ruby Parser

 I want to know whether it is possible to parse ruby language using just deterministic parser having no backtracking at all ?? ...