data-processing

What is the best way to reduce cyclomatic complexity when validating data?

Right now I'm working on a web application that receives a significant amount of data from a database that has a potential to return null results. When going through the cyclomatic complexity for the application a number of functions are weighing in between 10 - 30. For the most part the majority of the functions with the high numbers ha...

Processing CSV files from the Web using embedded Java database

Short version: assuming I don't want to keep the data for long, how do I create a database programmaticly in HSQLDB and load some CSV data into it to? My schema will match the files exactly and the files do have adequate column names. This is an unattended process. Details: I need to apply some simple SQL techniques to three CSV files...

Word Anagram Hashing Algorithm?

Given set of words, we need to find the anagram words and display each category alone using the best algorithm input: man car kile arc none like output: man car arc kile like none the best solution I am developing now is based on a hashtable, but I am thinking about equation to convert anagram word into integer value exmaple: man...

free secure distributed make system for linux

Are there any good language-agnostic distributed make systems for linux that are secure and free? Background Information: I run scientific experiments (computer-science ones) that sometimes have large dependency trees, occasionally on the order of thousands or tens of thousands of tree nodes. This dependency tree is over the data file...

Fast min on span

Given a list of arrays and lots of setup time I need to quickly find the smallest value in some sub-span of each array. In concept: class SpanThing { int Data; SpanThing(int[][] data) /// must be rectangulare { Data = data; //// process, can take a while } int[] MinsBruteForce(int from, int to...

Optimal update in resultset on Sybase IQ

Im looking to rewrite code that update a table on a Sybase IQ database v14 that does the following: selects all the records in the table and extracts some data to file updates the extracted to file flag for each record in the table Currently, once a record is written to file, its extraction flag is updated. Currently there are 40 00...

What is so wrong with extract()?

Hey everyone, I was recently reading this thread, on some of the worst PHP practices. In the second answer there is a mini discussion on the use of extract(), and im just wondering what all the huff is about. I personally use it to chop up a given array such as $_GET or $_POST where I then sanitize the variables later, as they have bee...

Patterns for non-layered applications

In Patterns of Enterprise Application Architecture, Martin Fowler writes: This book is thus about how you decompose an enterprise application into layers and how those layers work together. Most nontrivial enterprise applications use a layered architecture of some form, but in some situations other approaches, such as p...

What are some good Perl modules for flow-based programming on files?

What are some good Perl modules to process files based on configurations? Basically I am working on taking data files, split them into columns, remove some rows based on some columns, remove unnecessary columns, compare them to baseline (writes where changes have occured) and save a csv of the data and the comments as metadata. Sample...

Best way to transpose a grid of data in a file

I have large data files of values on a 2D grid. They are organized such that subsequent rows of data in the grid are subsequent lines in the file. Each column is separated by a tab character. Essentially, this is a CSV file, but with tabs instead of columns. I need the transpose the data (first row becomes first column) and output it to...

CPU bound applications vs. IO bound

For 'number-crunching' style applications that use alot of data (reads: "hundreds of MB, but not into GB" ie, it will fit nicely into memory beside the OS), does it make sense to read all your data into memory first before starting processing to avoid potentially making your program IO bound while reading large related datasets, instead ...

Web service for live Forex data for Eastern Europe currency?

Is there a Web service for live Forex data for Eastern Europe currency? The yahoo data is updated with a couple of minutes delay, so I don't want to use that. I've seen some Java Applets, but thy are no use, as I can't extract any data from them. As a specific request I'm looking for quotes on the Romanian currency, RON. ...

Intensive file I/O and data processing in C#

I'm writing an app which needs to process a large text file (comma-separated with several different types of records - I do not have the power or inclination to change the data storage format). It reads in records (often all the records in the file sequentially, but not always), then the data for each record is passed off for some proce...

Ways to read only select columns from a file into R? (A happy medium between `read.table` and `scan`?)

I have some very big delimited data files and I want to process only certain columns in R without taking the time and memory to create a data.frame for the whole file. The only options I know of are read.table which is very wasteful when I only want a couple of columns or scan which seems too low level for what I want. Is there a bette...

Regular Expressions to insert "\r" every n characters in a line and before a complete word (basically a wordwrap feature)

I'm new to JavaScript and regular expression. I'm trying to automatically format a text document to specific number of characters per line or put a "\r" before the word. This is functionally similar to Wordwrap found in numerous text editors. Eg. I want 10 characters per line Original:My name is Davey Blue. Modified:My name \ris Dave...

using Hibernate to loading 20K products, modifying the entity and updating to db

I am using hibernate to update 20K products in my database. As of now I am pulling in the 20K products, looping through them and modifying some properties and then updating the database. so: load products foreach products session begintransaction productDao.MakePersistant(p); session commit(); As of now things are pretty s...

C# - Data Clustering approach

Hi all, I am writing a program in C# in which I have a set of 200 points displayed on an image. However, the points tend to cluster in various regions, and I am looking to find a way to "cluster." In other words, maybe draw a circle/ellipse around the clustered points. Has anyone seen any way to do this? I have heard about K-means clu...

Processing XML data for JSP

I have a form/calculator, which posts to itself some data, this data is then calculated by dispatching a servlet and the results are output as xml. The dispatcher code is shown below: //create instance ServletContext sc = this.getServletContext(); //create dispatcher RequestDispatcher rd = sc.getRequestDispatcher("/ProCalcServlet"); rd...

How can I read specific data columns from a file in c

Good day all, I am a beginner in c programming.I have this problem and have have spent quite a huge amount of time on it without any considerable progress. My problem is stated thus: I have a series of files with the extension (.msr), they contain measured numerical values of more that ten parameters which ranges from date,time,temper...

How to read different files stored in a directory and store some data from them to one file

This is a follow up to the question I asked earlier and with the help of some people here I was able to start up with the function I want to write,but I am yet to complete it. Here is my earlier question: I have a series of files with the extension (.msr), they contain measured numerical values of more that ten parameters which ranges fr...