tags:

views:

113

answers:

4

Here's something that's really irked me over the years. I've never used any software that, when importing data from a column-aligned text file, can figure out the column breaks in a correct manner.

Excel 2K3 and a lot of other Microsoft components that seem to share a common codebase (like the import options for SQL2K) attempt to figure out the column breaks for you. Unfortunately, they only look at the first n rows, and are often completely wrong.

OpenOffice.Org 3.1 has a import dialog almost exactly like Excel 2K3 but it doesn't even attempt to guess the column breaks for you. And the latest version of Numbers doesn't appear to handle column-aligned imports at all.

Obviously column-aligned data is undesirable for a number of reasons, but a lot of older software (particularly in-house software various companies have floating around) exports data in this format so I do need to handle it every so often. Surely, somewhere, SOME software imports it well without me coding an import utility myself or manually specifying where twelve zillion columns start and stop?

OSX, Windows, whatever. I'm open to suggestions. Ultimate goal is to get it into a SQL Server table, but simply getting it into a Excel/XML/tab-delimited/etc file in the meantime would be fine because it's easy enough to get into SQL Server from there.

+1  A: 

Well obviously no software can be entirely correct in guessing the layout of a fixed column file, since there is no seperator (though variable width columns with higher maximum lengths will often produce enough space on the end to start guessing). For example the following could be anywhere from 1-9 columns (I have personally had to figure out some super packed fixed column layouts like this, only much longer)

135464876
647873159
345467575

If SQL Server is the ultimate destination, have you looked into the SQL Server import wizard?

Right click your database in Management Studio and select Tasks->Import Data. Proceed through and select "Flat File" as your data source. In the format dropdown change from Delimited to Fixed Width. On the left you can now use the Columns screen to draw the column seperators. There is also an advanced and preview screen.

David
Yes. When columns are packed like that with no spaces or other delimiter, I would not expect for any software to be able to automatically determine the column layout.
John Booty
Regarding SQL Server's import wizard, it's similar to Excel 2003's importer and doesn't make very intelligent guesses.In this particular file there's one longish text column for which the first 40 rows or so have the same value. Excel 2003 and SQL Server 2K's import wizards only look at the first few rows and guess that there's a column break after every individual word, which is wrong.Of course those are older versions of the software. Has it gotten better? Importing column-align data is a minor enough feature to fly under the devs' radar, I'd guess...
John Booty
you could "help" the guessing algorithm by moving a row with representative values to the top of the file, assuming order of the rows doesn't matter.
Peter Recore
+2  A: 

I tend to normalize such data with awk -- perhaps generating a csv file -- before trying to import it into Excel.

See the awk user's manual.

+1  A: 

I don't think there is a silver bullet for your request. I think the best you can hope for is to define your input format once and be able to reuse that format when you receive a file with the same format again.

As one poster mentioned you could use awk or, if .NET is more your thing, then you could use FileHelpers. It's an open source .NET library that does a good job reading and writing both Fixed length and delimited files. The downside is that you would be creating a .NET application to do the work (either inserting directly into a DB or perhaps creating an output file. On the plus side, once created, you could reuse the mapping classes again if you get the same file format.

Tuzo
A: 

Try out this demo (I was on development team):

Personator 4

Install, run the program, go to Tools | ASCII Conversion | Import from ASCII.

The import will be to DBF/FoxPro, but you can then export that file into one of the formats you mentioned.

The start/stop guesser uses a few statistical formulas to try to get the boundaries correct; you get to verify and/or correct with a graphical editor after analysis.

Marc Bernier