parsing

What would be the ideal script language for parsing text files?

My code reads in a file, usually HTML but it could be any plain text. Now I was thinking to have each piece as a separate module loaded externally at run time so I don't have to maintain it. I would like to use a scripting language to parse the text/strings and call my appropriate c or c++ functions. What scripting language would be good...

Read xlsx file in Java

I need to read a Excel 2007 xlsx file in a java application. Does anyone know of a good api to accomplish this task? Thanks in advance for any advice given. -MrPortico ...

Can anyone recommend a good SQL parsers?

I am trying to write a tool that can compare a database’s schema to the SQL in an install script. Getting the information from the database is pretty straightforward but I am having a little trouble parsing the install scripts. I have played with a few of the parsers that show up on Google but they seemed somewhat incomplete. Ideall...

Parse a file using C++, load the value to a structure

Hi, I have the following file/line: pc=1 ct=1 av=112 cv=1100 cp=1700 rec=2 p=10001 g=0 a=0 sz=5 cr=200 pc=1 ct=1 av=113 cv=1110 cp=1800 rec=2 p=10001 g=0 a=10 sz=5 cr=200 and so on. I wish to parse this and take the key value pairs and put them in a structure: struct pky { pky() : a_id(0), sz_id(0), cr_id(0), ...

Remove the Query String from a Url in HTML with a Regular Expression

Given a html document, what is the most correct and concise regular expression pattern to remove the query strings from each url in the document? ...

regex for html parsing (in c#)

Hello, I'm trying to parse a html page and extract 2 values from a table row. The html for the table row is as follows: - <tr> <td title="Associated temperature in (ºC)" class="TABLEDATACELL" nowrap="nowrap" align="Left" colspan="1" rowspan="1">Max Temperature (ºC)</td> <td class="TABLEDATACELLNOTT" nowrap="nowrap" align="Center" colsp...

How to parse logs written by multiple threads?

I have an interesting problem and would appreciate your thoughts for the best solution. I need to parse a set of logs. The logs are produced by a multi-threaded program and a single process cycle produces several lines of logs. When parsing these logs I need to pull out specific pieces of information from each process - naturally this i...

J2ME Properties

J2ME lacks the java.util.Properties class. Although it is possible to put application settings in the JAD file this is not recommended for many properties. (Since, some platforms limits the size of JAD file.) I want to put a configuration file inside my jar file and parse it. And I do not want to go with XML because it will be overshooti...

How can I convert these strings to a hash in Perl?

I wish to convert a single string with multiple delimiters into a key=>value hash structure. Is there a simple way to accomplish this? My current implementation is: sub readConfigFile() { my %CONFIG; my $index = 0; open(CON_FILE, "config"); my @lines = <CON_FILE>; close(CON_FILE); my @array = split(/>/, $lines[0]); my $total = @...

Best approach to parsing a *.c/*.h files using C# for declarations and data type definitions

Hello, I need to parse .c/.h files for the data declaration and extract the type declarations. For example, I might need to extract the variable declaration and it corresponding data type that might look like: typedef union { struct { unsigned char OG15 : 1, ... OG0 : 1; } Bits; u...

C++ converting a mac id string into an array of uint8_t

I want to read a mac id from command line and convert it to an array of uint8_t values to use it in a struct. i can not get it to work. i have a vector of string for the mac id split about : and i want to use stringstream to convert them with no luck. can anyone point me what i am missing? int parseHex(const string &num){ stringstre...

Parsing PlainText Emails from HTML Content (ASP.NET)

Hi All, Right, in short we basically already have a system in place where the HTML content for emails is generated. It's not perfect, but it works. From this, we need to be able to derive a plaintext alternative for the email. I was thinking of instantly jumping on and creating a RegEx to strip the <*> tags from the message - but then ...

JavaScript: Dynamic Field Names

HI All, I have a piece of javaScript that removes commas from a provided string (in my case currency values) It is: function replaceCommaInCurrency(myField, val) { var re = /,/g; document.net1003Form.myField.value=val.replace(re, ''); } 'MyField' was my attempt to dynamically have this work on any field that I pass in, but i...

email body from a parsed email object in jython

I have an object. fp = open(self.currentEmailPath, "rb") p = email.Parser.Parser() self._currentEmailParsedInstance= p.parse(fp) fp.close() self.currentEmailParsedInstance, from this object I want to get the body of an email, text only no html.... How do I do it? something like this? newmsg=self._current...

Parsing XML/XHTML in Actionscript

Is there anything similar to getElementById in actionscript? I'm trying to make a prototype of a flash page wich gets it's data from a xhtml file. I want to have both an accessible html version (for search engines, textreaders and people without flash) and a flash version (because the customer insists to use flash even though a html-cs...

How would you prefer <script> elements handled?

Imagine the following. Html is parsed into a dom tree Dom Nodes become available programmatically Dom Nodes may-or-may-not be augmented programmatically Augmented nodes are reserialised to html. I have primarily a question on how one would want the "script" tag to behave. my $tree = someparser( $source ); .... print $somenode->...

Buffering data for delimiter separated blocks

Hi There is a question I have been wondering about for ages and I was hoping someone could give me an answer to rest my mind. Let's assume that I have an input stream (like a file/socket/pipe) and want to parse the incoming data. Let's assume that each block of incoming data is split by a newline, like most common internet protocols. T...

Can Boost Spirit be used to parse byte stream data?

Can Spirit (part of Boost C++ library) be used to parse out binary data coming from a stream? For example, can it be used to parse data coming from a socket into structures, bytes, and individual bit flags? Thanks! ...

Parse HTML via XPath

In .Net, I found this great library, HtmlAgilityPack that allows you to easily parse non-well-formed HTML using XPath. I've used this for a couple years in my .Net sites, but I've had to settle for more painful libraries for my Python, Ruby and other projects. Is anyone aware of similar libraries for other languages? ...

Initializing struct, using an array

I have a couple of array's: const string a_strs[] = {"cr=1", "ag=2", "gnd=U", "prl=12", "av=123", "sz=345", "rc=6", "pc=12345"}; const string b_strs[] = {"cr=2", "sz=345", "ag=10", "gnd=M", "prl=11", "rc=6", "cp=34", "cv=54", "av=654", "ct=77", "pc=12345"}; which i then need to parse out for '=' and then put the values in the struct. ...