views:

3245

answers:

9

What is the simplest and most-pythonic way to parse a DICOM file?

A native Python implementation without the use of non-Python libraries would be much preferred. DICOM is the standard file format in digital medical imaging (look here for more information).

There are some C/C++ libraries that support reading (a subset) of DICOM files. Two or three of them even have Python bindings. A native Python parser would serve two purposes for me:

  1. No need to build any external C/C++ libraries.
  2. Learn about the DICOM file format.
A: 

I wonder what the original poster tried and which methods worked and not worked for him. I have never worked with DICOM, but a quick google search for "DICOM python" gave several interesting results. It seems that this project: http://www.creatis.univ-lyon1.fr/Public/Gdcm/ should deliver what you want. It has python bindings and a pretty active mailing list.

bgbg
Is this a "Please do my homework for me" question?
S.Lott
No, IMHO I did my homework: There are some C/C++ libraries that support reading (a subset) of DICOM files. Two or three of them even have Python bindings.A native Python parser would serve two purposes for me:1. No need to build any C/C++ libraries.2. Learn about the DICOM file format.
Tom Pohl
A: 

There are some libraries (most often implemented in C/C++) with Python bindings, e.g.:

However, I'm looking for a native Python implementation to learn more about the DICOM file format.

Tom Pohl
+2  A: 

Some years ago I was looking for the same thing and found this: Python DICOM lib

I wasn't too impressed with the code, but it is native Python reading DICOM files.

A: 

Hmm... you might be out of luck... but it's a good opportunity to make your own DICOM lib in Python ;-)

DICOM is a pain, haha.

unforgiven3
+1  A: 

And as of today there's another pure Python package reading DICOM files available on Google code: pydicom

Looks interesting.

HTH

+1  A: 

If you want to learn about the DICOM format, "Digital Imaging and Communications in Medicine (DICOM): A Practical Introduction and Survival Guide" by Oleg Pianykh is quite readable and gives a good introduction to key DICOM concepts. Springer-Verlag is the publisher of this book. The full DICOM standard is, of course, the ultimate reference although it is somewhat more intimidating. It is available from NEMA (http://medical.nema.org).

The file format is actually less esoteric than you might imagine and consists of a preamble followed by a sequence of data elements. The preamble contains the ASCII text "DICM" and several reserved bytes that are unused. Following the preamble is a sequence of data elements. Each data element consists of the size of the element, a two-character ASCII code indicating the value representation, a DICOM tag, and the value. Data elements in the file are ordered by their DICOM tag numbers. The image itself is just another data element with a size, value representation, etc.

Value representations specify exactly how to interpret the value. Is it a number? Is it a character string? If it's a character string, is it a short one or a long one and which characters are permitted? The value representation code tells you this.

A DICOM tag is a 4 byte hexadecimal code composed of a 2 byte "group" number and a 2 byte "element" number. The group number is an identifier that tells you what information entity the tag applies to (for example, group 0010 refers to the patient and group 0020 refers to the study). The element number identifies the interpretation of the value (items such as the patient's ID number, the series description, etc.). To find out how you should interpret the value, your code looks up the DICOM tag in a dictionary file.

There are some other details involved, but that's the essence of it. Probably the most instructive thing you can do to learn about the file format is to take an example DICOM file, look at it with a hex editor, and go through the process of parsing it mentally. I would advise against trying to learn about DICOM by looking at existing open source implementations, at least initially. It is more likely to confuse instead of enlighten. Getting the big picture is more important. Once you have the big picture, then you can descend into subtleties.

+1  A: 

Newer gdcm development now happen here:

http://gdcm.sourceforge.net/

It supports Java and C# on top of python.

Why write yet another dicom implementation when you can centralize a single C++ implementation and have it accessible to so many different languages

+1  A: 

The library pydicom mentioned above seems like a great library for accessing the DICOM data structures. To use it to access e.g. RT DOSE data, I guess one would do something like

import dicom,numpy
dose = dicom.ReadFile("RTDOSE.dcm")
d = numpy.fromstring(dose.PixelData,dtype=numpy.int16)
d = d.reshape((dose.NumberofFrames,dose.Columns,dose.Rows))

and then, if you're in mayavi,

from enthought.mayavi import mlab
mlab.pipeline.scalar_field(d)

This gives wrong coordinates and dose scaling, but the principle ought to be sound.

CT data should be very similar.

+1  A: 

I'm using pydicom heavily these days, and it rocks.

It's pretty easy to start playing with it:

import dicom 
data = dicom.ReadFile("yourdicomfile.dcm")

To get the interesting stuff out of that "data" object, somehow resembling dcmdump output:

for key in data.dir():        
    value = getattr(data, key, '')
    if type(value) is dicom.UID.UID or key == "PixelData":
        continue

    print "%s: %s" % (key, value)

I think a great way to learn more about the dicom format is to open similar files and write code to compare them according to various aspects: study description, window width and center, pixel representation and so on.

Have fun! :)

ema