views:

77

answers:

4

I have a number of rather large binary files (fixed length records, the layout of which is described in another –textual– file). Data files can get as big as 6 GB. Layout files (cobol copybooks) are small in size, usually less than 5 KB.

All data files are concentrated in a GNU/Linux server (although they were generated in a mainframe).

I need to provide the testers with the means to edit those binary files. There is a free product called RecordEdit (http://record-editor.sourceforge.net/), but it has two severe drawbacks:

  1. It forces the testers to download the huge files through SFTP, only to upload them once again every time a slight change has been made. Very inefficient.

  2. It loads the entire file into working memory, rendering it useless for all but the relatively small data files.

What I have in mind is a client/server architecture based in Java:

  • The server would be running a permanent process, listening for edition-oriented requests coming from the client. Such requests would include stuff like

    • return the list of available files

    • lock certain file for edition

    • modify this data in that record

    • return the n-th page of records

    and so on…

  • The client could take any form (RCP-based in a desktop –which is my first candidate-, ncurses in the same server, a middle web application…) as long as it is able to send requests to the server.

I've been exploring NIO (because of its buffers) and MINA (because of protocol transparency) in order to implement the scheme. However, before any further advancement of this endeavor, I would like to collect your expert opinions.

Is mine a reasonable way to frame the problem?

Is it feasible to do it using the language and frameworks I'm thinking of? Is it convenient?

Do you know of any patterns, blue prints, success cases or open projects that resemble or have to do with what I'm trying to do?

+2  A: 

As I see it, the tricky thing here is decoding the files on the server. Once you've written that, it should be pretty easy.

I would suggest that, whatever the thing you use client-side is, it should basically upload a 'diff' of the person's changes.

Might it make sense to make something that acts like a database (or use an existing database) for this data? Or is there just too much of it?

Depending on how many people need to do this, the quick-and-dirty solution is to run the program via X forwarding -- that eliminates a number of the issues.. as long as that server has quite a lot of RAM free.

zebediah49
Moving only the changes: good point.As to the database idea, at a certain time I thought of converting every file into a temporary table in a database, and use conventional DB-client tools. But it's not practical, because of the large translation times, and because of the binary nature of data, to be interpreted in the client. A DB-like wrapper could be written, but that's basically the approach I describe –by other name– :)X forwarding… as in X11? In that case, I would be forced to install a X11 server in each client, and they will not allow me to do that.
Bruno Unna
+1 4 recommending database. This is what dbs are 4.
emory
+1  A: 

Have you considered using a distributed file system like OpenAFS? That should be able to handle very large files. Then you can write a client-side app for editing the files as if they are local.

Didn't know about it, I'll take a took at it ASAP. Thanks for the reference.
Bruno Unna
+1  A: 

Is mine a reasonable way to frame the problem?

IMO, yes.

Is it feasible to do it using the language and frameworks I'm thinking of?

I think so. But there are other alternatives. For example:

  • Put the records into a database, and access by a key consisting of a filename + a record number. Could be a full RDBMS, or a more lightweight solution.

  • Implement as a RESTful web service with a UI implemented in HTML + javascript.

  • Implement using a scalable distributed file-system.

Also, from your description there doesn't seem to be a pressing need to use a highly scalable / transport independent layer ... unless you need to support hundreds of simultaneous users.

Is it convenient?

Convenient for who? If you are talking about you the developer, it depends if you are already familiar with those frameworks.

Stephen C
A: 

Try PilotEdit Lite, it is free and can edit huge files through FTP. http://www.pilotedit.com

Dracoder
Although interesting program, PilotEdit doesn't solve the problem because of two reasons: a) the data that needs to be edited is binary, and must be interpreted according to an external layout file and b) it seemingly only runs in windows.
Bruno Unna