views:

5466

answers:

14

Please, provide code examples in a language of your choice.

Update: No constraints set on external storage.

Example: Integers are received/sent via network. There is a sufficient space on local disk for intermediate results.

+2  A: 

1 million 32-bit integers = 4 MB of memory.

You should sort them using some algorithm that uses external storage. Mergesort, for example.

gabr
+1  A: 

Smells like homework. Anyhow, Why don't you partition it into chunks that fit into memory, sort each of them using QuickSort and then merge the result together?

norheim.se
+7  A: 

Split the problem into pieces small enough to fit into available memory, then use merge sort to combine them.

moonshadow
probably best solution, exept you also want to have enough working space in memory to sort them...
Omar Kooheji
I'm interested in code examples (I've already read theoretical aspects in Knuth)
J.F. Sebastian
+3  A: 

You need to provide more information. What extra storage is available? Where are you supposed to store the result?

Otherwise, the most general answer: 1. load the fist half of data into memory (2MB), sort it by any method, output it to file. 2. load the second half of data into memory (2MB), sort it by any method, keep it in memory. 3. use merge algorithm to merge the two sorted halves and output the complete sorted data set to a file.

zvrba
I've updated the question.
J.F. Sebastian
+4  A: 

I don't know, but maybe you could ask a Google Engineer.

Ed Guiness
Wow, nice catch.
Bob Somers
A: 

As people above mention type int of 32bit 4 MB.

To fit as much "Number" as possible into as little of space as possible using the types int, short and char in C++. You could be slick(but have odd dirty code) by doing several types of casting to stuff things everywhere.

Here it is off the edge of my seat.

anything that is less than 2^8(0 - 255) gets stored as a char (1 byte data type)

anything that is less than 2^16(256 - 65535) and > 2^8 gets stored as a short ( 2 byte data type)

The rest of the values would be put into int. ( 4 byte data type)

You would want to specify where the char section starts and ends, where the short section starts and ends, and where the int section starts and ends.

J.J.
Wouldn't help much. Or wouldn't help at all if all numbers are bigger than 65535.
gabr
You'd waste more space keeping track of types than you'd save!
Adam Hawes
+2  A: 

A bubble sort is the wrong way to go!

Aidan
A: 

This wikipedia article on External Sorting have some useful information.

chakrit
+1  A: 

Dual tournament sort with polyphased merge

#!/usr/bin/env python
import random
from sort import Pickle, Polyphase


nrecords = 1000000
available_memory = 2000000 # number of bytes
    #NOTE: it doesn't count memory required by Python interpreter 

record_size = 24 # (20 + 4) number of bytes per element in a Python list
heap_size = available_memory / record_size 
p = Polyphase(compare=lambda x,y: cmp(y, x), # descending order
              file_maker=Pickle, 
              verbose=True,
              heap_size=heap_size,
              max_files=4 * (nrecords / heap_size + 1))

# put records
maxel = 1000000000
for _ in xrange(nrecords):
    p.put(random.randrange(maxel))

# get sorted records
last = maxel
for n, el in enumerate(p.get_all()):
    if el > last: # elements must be in descending order
        print "not sorted %d: %d %d" % (n, el ,last)
        break
    last = el

assert nrecords == (n + 1) # check all records read
J.F. Sebastian
A: 

No example, but Bucket Sort has relatively low complexity and is easy enough to implement

Harald Scheirich
+4  A: 

Sorting a million 32-bit integers in 2MB of RAM using Python by Guido van Rossum

J.F. Sebastian
That's suspiciously specific :)
skaffman
A: 
  • Um, store them all in a file.
  • Memory map the file (you said there was only 2M of RAM; let's assume the address space is large enough to memory map a file).
  • Sort them using the file backing store as if it were real memory now!
Adam Hawes
A: 

-=======Sorry..by mistake..I replied in the middle of the post above===========-

I know this is an age old thread. But I attended an interview last week and a similar question was asked.

How do you sort a billion rows of data in a file with only 640KB of memory in a 8080 processor based machine.

--No Virtual memory -- no external disk

I explicitly asked to if I could use a hardrive, so I can serialize trees as I sort them and then combine at the end. He said no. I tried many ways, different algorithms..nothing he agreed.

I gave up and asked him politely, how would you do that. He bluntly said, I would not tell you. (Well..call ended write after that. I didn't mean to offend him, as a developer, I got curious..moreover, it was an instinctive question, just as I would ask anyone at workplace..duh)

this interview was for a really..really big big bank.

Matty H