ansaurus

Question

Python faster way to read fixed length fields form a file into dictionary

Answer 1

+2 A:

struct.unpack() combined with s specifiers with lengths will tear the string apart faster than slicing.

Ignacio Vazquez-Abrams 2010-05-06 05:50:42

Tried this out but not sure how to deal with overlapping fields

Martlark 2010-05-07 00:06:34

... Overlapping fields? Who came up with that one?

Ignacio Vazquez-Abrams 2010-05-07 00:57:08

Answer 2

A:

If you want to get some speed up, you can also store field_start+field_length directly in self.field_list, instead of storing field_length.

I would say that your method is quite fast, compared to what standard Python can do (i.e., without using non-standard, dedicated modules).

EOL 2010-05-06 07:42:22

Answer 3

A:

If your lines include commas like the example, you can use line.split(',') instead of several slices. This may prove to be faster.

lunixbochs 2010-05-06 07:49:04

As long as none of the records ever have a comma...

eswald 2010-05-06 21:26:26

Answer 4

A:

You'll want to use the csv module.

It handle not only csv, but any csv-like format which yours seems to be.

e-satis 2010-05-06 07:59:31

Unfortunately "CSV-like" may not be enough. It may be possible for fields to contain embedded commas, at which point both `csv` and `line.split(',')` will fail horribly.

Ignacio Vazquez-Abrams 2010-05-06 08:03:13

Answer 5

+1 A:

Edit: Just saw your remark below about commas. The approach below is fast when it comes to file reading, but it is delimiter-based, and would fail in your case. It's useful in other cases, though.

If you want to read the file really fast, you can use a dedicated module, such as the almost standard Numpy:

data = numpy.loadtxt('file_name.txt', dtype=('S10', 'S8'), delimiter=',')   # dtype must be adapted to your column sizes

loadtxt() also allows you to process fields on the fly (with the converters argument). Numpy also allows you to give names to columns (see the doc), so that you can do:

data['name'][42]  # Name # 42

The structure obtained is like an Excel array; it is quite memory efficient, compared to a dictionary.

If you really need to use a dictionary, you can use a dedicated loop over the data array read quickly by Numpy, in a way similar to what you have done.

EOL 2010-05-06 08:18:21

ansaurus

tags:

views:

answers:

Python faster way to read fixed length fields form a file into dictionary

related questions