views:

526

answers:

13

Morning all,

I've gone and told a customer I could migrate some of their old data out of a DOS based system into the new system I've developed for them. However I said that without actually looking at the files that stored the data in the old system - I just figured a quick google would solve all the problem for me... I was wrong!

Anyway, this program has a folder with hundreds... well 800 files with all sorts of file extensions, .ave, .bak, .brw, .dat, .001, .002...., .007, .dbf, .dbe and .his.

.Bak obviously isn't a SQL backup file.

Does anyone have any programming experience using any of those file types who may be able to point me in the direction of some way to read and extract the data?

I cant mention the program name for the reason that I don't think the original developer will allow this...

Thanks.

+8  A: 

I'm willing to bet that the .dbf file is in DBase format, which is really straightforward. The contents of that might provide clues to the rest of them.

Greg Hewgill
+2  A: 

I think Greg is right about .dbf file. You should try to find some information about other file formats using sites like http://filext.com and http://dotwhat.net. The .bak file is usually a copy of another file with the same name, but other extension. For example there may be database.dbf file and database.bak file with backup of it. You should ask (if it's possible) for any details/documentation/source code of application that used that files from your customer.

Dmitriy Matveev
Also try wotsit.org
Blorgbeard
+1  A: 

Back in the DOS days, programmers used to make up their own file extentions pretty much as they saw fit. The DBF might well be a DBase file which is easy enough to read, and the .BAK is probably a backup of one of the other important files, or just a backup left by a text editor.

For the remaining files, first thing I would do is check if they are in a readable ASCII format by opening them in a text editor.

If this doesn't give you a good result, try opening them in a binary editor that shows side by side hex and ASCII with control characters blanked out. Look for repeating patterns that might correspond to record fields. For example, say the .HIS was something like an order histrory file, it might contain embedded product codes or names. If this is the case, count the number of bytes between such fields. If it is a regular number, you probably have a flat binary file of records. This is best decoded by opening the file in the app, looking for values in a given record, and searching for the corresponding values in the binary file. Time consuming, and a pain in the ass, but workable enough once you get the hang of it.

Happy hacking!

Shane MacLaughlin
+3  A: 

Could be anything. Best be is to open with a hex editor, and see what you can see

Most older systems used a basic ISAM which had one file per table that contained a set of fixed length data records. The other files would probably be indexes

As you only need the data, not the index, just look for the files with repeating data patterns (it often looks like pretty patterns on the hex editor screen)

When you find the file with the data, try to locate a know record e.g. "Mr Smith" and see if you can work out the other fields. Integers are often byte for byte, dates are often encoded and days from a known start date, money could be in BCD

If you see a strong pattern, then most likely each record is a fixed length. There will probably be a header block on the file say 128 or 256 bytes, and then the fixed length records

Many old system where written in COBOL. There is plenty of info on the net re cobol formats, and some companies even sell COBOL ODBC drivers!

TFD
+5  A: 

the unix 'file' utility can be used to recognize many file types by their 'magic number'. It examines the file's contents and compares it with thousands of known formats. If the files are in any kind of common format, this can probably save you a good amount of work.

if they're NOT in a common format, it may send you chasing after red herrings. Take its suggestions as just that, suggestions.

TokenMacGuy
+3  A: 

In complement to the sites suggested by Greg and Dmitriy, there's also the repository of file formats at http://www.wotsit.org ("What's its format?").

If that doesn't help, a good hex editor (with dump display) is your friend... I've always found it amazing how easy it can be to read and recognize many file formats.

Martijn
A: 

As others have suggested, I recommend a hex editor if you can't figure out what those files are and that dbf is probably Dbase.

BAK seems to be a backup file. I'm thinking that *.001, *.002, etc might be part of the backup. Are they all the same size? Maybe the backup was broken up into smaller pieces so that it could fit onto removable media?

Finally, take this as a life lesson. Before sending that Statement of Work over, if the customer asks you to import data from System A to System B, always ask for the sample schema and sample data and sample files. Lots of times things that seem straight forward hand up being nightmares.

Good luck!

Giovanni Galbo
A: 

Be sure to use the Modified date on the files as clues, if the .001, .002, etc all have similar time stamps, maybe along with the .BAK, they might be part of the backup. Also there may be some old cruft in the directory you can (somewhat safely) ignore. Look for .BAT files and try to dissect them as well.

SqlACID
+1  A: 

.DBF is a dBASE or early FoxPro database.

.DAT was used by Btrieve, and IIRC Paradox for DOS.

The .DBE and .00x files are probably either temporary or index files related to the .DAT files.

.DBF is easy. They'll open with MS Access or Excel (pre-2007 versions of Office, anyway), or with ADO or ODBC.

If the .DAT files are indeed Btrieve, you're in a world of hurt. They're a mess, even if you can get your hands on the right version of the data dictionary and a copy of the Btrieve structure. (Been there, done that, wore out the t-shirt before I got done.)

Ken White
you can open dbf files with excel 2007, its just not listed in the supported formats
Tim
A: 

One hint, if the .dbf files are DBase, FoxPro, or one of the other products that used that format. Then you may be able to read them using ODBC. My system still has the ODBC driver for .dbf (Vista, with VS 2008 - how it got there I'd have to hunt up, but I'd guess it was MDAC Microsoft Data Access which put that there). So, you may not have a "world of unpicking to do", if the ODBC driver will read the .dbf files.

I seem to remember (with a little confidence of 20+ years ago DBase III tinkering) that DBase used .001, .002, ... file for memo (big text) fields.

Good luck trying to salvage the data.

Aussie Craig
A: 

The DBF format is fairly common.

The other files are puzzling. I'm guessing that either you're dealing with old BTrieve files (bad), or (hopefully) with the results of some ill-conceived backup scheme where someone backed up his database into the same directory rather than into the hard drive in which case you could ignore these.

Uri
A: 

It's now part of Pervasive, but I used, years ago, Data Junction to migrate data from lots of file types to others. Have a look, unless you want to write a parser.

A: 

.dat can also be old Clarion 2.1 files... It works on an ISAM basis also, with key/index files