views:

447

answers:

5

I am working with 1gb large tiff images of around 20000 x 20000 pixels. I need to extract several tiles (of about 300x300 pixels) out of the images, in random positions.

I tried the following solutions:

  • Libtiff (the only low level library I could find) offers TIFFReadline() but that means reading in around 19700 unnecesary pixels.

  • I implemented my own tiff reader which extracts a tile out of the image without reading in unnecesary pixels. I expected it to be faster, but doing a seekg for every line of the tile makes it very slow. I also tried reading to a buffer all the lines of the file that include my tile, and then extracting the tile from the buffer, but results are more or less the same.

I'd like to receive suggestions that would improve my tile extraction tool!

Everything is welcome, maybe you can propose a more efficient library I could use, some tips about C/C++ I/O, some higher level strategy for my needs, etc.

Regards, Juan

+2  A: 

Just mmap your file.

http://www.kernel.org/doc/man-pages/online/pages/man2/mmap.2.html

Gaetano Mendola
I'm currently testing this option. Thanks for your reply.
Juan
Interesting on 64 bit operating systems. Large tiff files easily go past 32-bit boundaries. On my xp I have problems reading bitmaps of 400MByte and above, because 'virtual memory' fragmentation. That is: I can't find a 400 MByte chunk of consecutive memory space, even with 2 GByte free (!) RAM.
Adriaan
+1  A: 

[Major edit 14 Jan 10]

I was a bit confused by your mention of tiles, when the tiff is not tiled.

I do use tiled/pyramidical TIFF images. I've created those with VIPS vips im_vips2tiff source_image output_image.tif:none,tile:256x256,pyramid I think you can do with : vips im_vips2tiff source_image output_image.tif:none,tile:256x256,flat You may want to experiment with tile size. Then you can read using TIFFReadEncodedTile.

Multi-resolution storage using pyramidical tiffs are much faster if you need to zoom in/out. You may also want to use this to have a course image nearly immediately followed by a detailed picture.

After switching to (appropriately sized) tiled storage (which will bring you MASSIVE performance improvements for random access!), your bottleneck will be disk io. File read is much faster if read in sequence. Here mmapping may be the solution.

Some useful links:

VIPS IIPImage LibTiff.NET stackoverflow VIPS is a image handling library which can do much more than just read/write. It has it's own, very efficient internal format. It has a good documentation on the algorithms. For one, it decouples processing from filesystem, thereby allowing tiles to be cached.

IIPImage is a multi-zoom webserver/browser library. I found the documentation a very good source of information on multi-resolution imaging (like google maps)

The other solution on this page, using mmap, is efficient only for 'small' files. I've hit the 32-bit boundaries often. Generally, allocating a 1 GByte chunk of memory will fail on a 32-bit os (with 4 GBytes RAM installed) due to the fact that even virtual memory gets fragemented after one or two application runs. Still, there is sufficient memory to cache parts or whole of the image. More memory = more performance.

Adriaan
A: 

I did something similar to this to handle an arbitrarily large TARGA(TGA) format file. The thing that made it simple for that kind of file is that the image is not compressed. You can calculate the position of any arbitrary pixel within the image and find it with a simple seek. You might consider targa format if you have the option to specify the image encoding.

If not there are many varieties of TIFF formats. You probably want to use a library if they've already gone through the pain of supporting all the different formats.

Jay
A: 

Thanks everyone for the replies.

Actually a change in the way tiles were required, allowed me to extract the tiles from the files in hard disk, in a sequential way, instead of a random way. This allowed me to load a part of the file into ram, and extract the tiles from there.

The efficiency gain was huge. Otherwise, if you need random access to a file, mmap is a good deal.

Regards, Juan

Juan
A: 

Adriaan,

As you said to create Tiled Pyramidal TIFF command is "vips im_vips2tiff source_image output_image.tif:none,tile:256x256,pyramid". It does not work with BigTiff image. I have tried it on 30GB Tiff file, it did not work.

Do you have any idea how to make it work on huge tiff images?

Thanks, Tejas Gajera

Tejas