views:

283

answers:

3

I have a large file (4+ gigs) of, lets just say, 4 byte floats. I would like to treat it as List, in the sense that I would like to be able to use map, filter, foldl, etc. However, instead of producing a new list with the output, I would like to write the output back into the file, and thus only have to load a small portion of the file in memory. You could say I what a type called MutableFileList

Has anyone ran into this situation before? Instead of re-inventing the wheel I was wondering if there a Hackish way for dealing with this?

+1  A: 

You might use mmap to map the file to memory and then process it. There is a mmap module that promises to read and write mmaped files and can even work with lazily mapped chunks of files, but I haven't tried it.

The interface for writing to the mapped file seems to be quite low level, so you'd have to build your own abstractions or work with Foreign.Ptr and the like.

sth
Won't work with files over 2 gb on windows.
Jonathan Fischoff
@Jonathan: Are you sure it won't work? The documentation says the module uses `CreateFileMapping` and `MapViewOfFile`, both of which that 64bits worth of file size/offset parameters, so those API call should work for files of any size (eg http://msdn.microsoft.com/en-us/library/aa366761%28VS.85,lightweight%29.aspx ). Does the module then somehow break this functionality?
sth
@sth Honestly I don't know for sure. I'm going off what I read the net. I got that limitation of a thread about memory mapped files on this site. I didn't see anything on MSDN that specify size requirements either way, but a I don't think a program is going to be able to get more then 2 gb of memory no matter how you slice it. I kind of what the reverse, file mapped memory :)
Jonathan Fischoff
+5  A: 

This should be quite helpful to you. You can use readFile and writeFile for what you need to do, and everything is done lazily. It only keeps things in memory while they are still being used, so you can read, process, and write the file out without blowing up your computer.

Rayne
Hmm, I didn't realize that the values would be evicted from memory. Okay, I'll give that a shot.
Jonathan Fischoff
+6  A: 

You should not treat it as a [Double] or [Float] in memory. What you could do is use one of the list-like packed array types, such as uvector/vector/... in company with mmapFile or readFile to pull chunks of the file in at a time, and process them. Or use a lazy packed array type, equivalent to lazy bytestrings.

Don Stewart
You're becoming as prominent as Jon Skeet in the Haskell community. You get upvoted just for posting. :p
Rayne