I am trying to parse text off of a PDF page into sentences but it is much more difficult than I had anticipated. There are a whole lot of special cases to consider such as initials, decimals, quotations, etc which contain periods but do not necessarily end the sentence.
I was curious if anyone here was familiar with an NLP library for ...
Hi there,
I've developed an own file format for configuration files (plaintext and line based -> EOL = one configuration) for an application. This format is nothing quit special and the only reason I do this, is to learn something! The reader and writer functions will be implemented in C (with GLib because it should be a UTF8 encoded fi...
I'm trying to stringify a multi-array variable into a JSON string in Javascript. The
//i'm using functions from http://www.json.org/json2.js
var info = new Array(max);
for (var i=0; i<max; i++) {
var coordinate = [25 , 32];
info[i] = coordinate;
}
var result = JSON.stringify(info);
But result doesn't look like a JSON string at a...
Does anyone know how to parse SQL Text with VB.NET?
Ex: I got a sql file "CREATE TABLE..." i want to get an array of columns and an array of data types.
...
I need to parse a log in the following format:
===== Item 5483/14800 =====
This is the item title
Info: some note
===== Item 5483/14800 (Update 1/3) =====
This is the item title
Info: some other note
===== Item 5483/14800 (Update 2/3) =====
This is the item title
Info: some more notes
===== Item 5483/14800 (Update 3/3) =====
This is th...
I was wondering when dealing with a web service API that returns XML, whether it's better (faster) to just call the external service each time and parse the XML (using ElementTree) for display on your site or to save the records into the database (after parsing it once or however many times you need to each day) and make database calls i...
What I have to do
I'm trying to manipulate some rather large amounts of data stored in Excel files (one of the workbooks has as much as 150 spreadsheets). The result of these manipulations may yield approximately 800.000 rows in a database table.
The problem
Data stored in the spreadsheets has unpredictable format. The company that ge...
This is an extension of this question. I'm trying to parse HTML snippets embedded in an XML backup of a Blogger blog and retag them with InDesign tags.
Blogger doesn't standardize the HTML for any of its posts, and the posts can be written in Word, Windows Live Writer, the native Blogger interface, or text editors, resulting in tons of ...
Does anyone know of a library that offers something similar to .NET's Parse/TryParse for dates and times that can be used on Linux from C++?
I've looked at the Boost date/time code but I'm not sure that I can do it without specifying the particular input format before attempting to parse. Basically, I might have dates in any number o...
I have an xml like this
<resultGroups>
<subGroups>
<results> </results>
<results> </results>
</subGroups>
<subGroups>
<results> </results>
<results> </results>
</subGroups>
<name> </name>
</resultGroups>
<resultGroups>
<subGroups>
<results> </results>
<results...
The Problem: I am trying to extract a valid game mode for Defense of the Ancients (DotA) from a game name using C++.
Details:
Game names can be, at most, 31 characters long
There are three game mode categories: primary, secondary, and miscellaneous
There can only be 1 primary game mode selected
Certain primary game modes are incompat...
I would like to test the syntax highlighting rules, and for that purpose I would like to have a sample data that is generated basing on the formal grammar.
Is there any tool that allows to generate either a random sample for the grammar or the full grammar sample (as an example - generate all the possible SELECT clauses with the 'valid' ...
Possible Duplicate:
Robust, Mature HTML Parser for PHP
I am looking for a good way to parse and modify html documents server side in php. Beautiful soup and hpricot look like very good tools but they are not available for php. Are there any good libraries that can do this in php? Tidy appears to be partially what I am looking fo...
Newb here trying to fix my php code. Getting an error at line 89.
<?php
/**
* @version $Id: index.php 10381 2008-06-01 03:35:53Z pasamio $
* @package Joomla
* @copyright Copyright (C) 2005 - 2008 Open Source Matters. All rights reserved.
* @license GNU/GPL, see LICENSE.php
* Joomla! is free software. This version may have been ...
Most scripts that parse /proc/cmdline break it up into words and then filter out arguments with a case statement, example:
CMDLINE="quiet union=aufs wlan=FOO"
for x in $CMDLINE
do
»···case $x in
»···»···wlan=*)
»···»···echo "${x//wlan=}"
»···»···;;
»···esac
done
The problem is when the WLAN ESSID has spaces. Users expect to set wlan='...
I'm writing a parser for a very simple grammar in javacc. It's beginning to come together but at the moment I'm completely stuck on this error:
ParseException: Encountered "" at line 4, column 15.
Was expecting one of:
The line of input in question is z = y + z + 5
and the production that is giving me problems is my expression w...
I need a script or cmd line tool get an mp3 length in milliseconds. The files are 64 kbits mono cbr encoded with lame.
(I looked for a libmad for ruby, my language of choice, but found nothing noteworthy...)
...
Hi,
I have a header file in which there is a large struct. I need to read this structure using some program and make some operations on each member of the structure and write them back.
For example I have some structure like
const BYTE Some_Idx[] = {
4,7,10,15,17,19,24,29,
31,32,35,45,49,51,52,54,
55,58,60,64,65,66,67,69,
70,72,76,7...
I have a config file that is in the following form:
protocol sample_thread {
{ AUTOSTART 0 }
{ BITMAP thread.gif }
{ COORDS {0 0} }
{ DATAFORMAT {
{ TYPE hl7 }
{ PREPROCS {
{ ARGS {{}} }
{ PROCS sample_proc }
} }
} }
}
The real file may not have these exact fields, a...
I want to know whether it is possible to parse ruby language using just
deterministic parser having no backtracking at all ??
...