views:

804

answers:

2

Is there a way to do a disk-based binary search for a particular key in a text file in Javascript? The text file is too big to be loaded into memory, but sorted by the key values. In particular I am looking for a way to mimic Perl's Search::Dict functionality in Javascript.

For e.g. If I have a file foo.txt:

a 1
b 10
c 5
z 4

look(c,foo.txt) should return the line 'c 5', by doing a binary search and not traversing the file linearly.

+1  A: 

Not really, binary searches are really only possible when you can identify the record beginnings. You appear to have variable length records so, unless you create an array of line start offsets, it's not going to work.

As Nikhil rightly points out in a comment, one method would be to binary divide the file based on file size and then find the closest line beginning from there. That would still be relatively efficient (i.e., much better than a sequential search).

paxdiablo
Perl's Search::Dict does it: http://perldoc.perl.org/Search/Dict.html. I'm not exactly sure how it works internally, but potentially you can "jump" based on bytes and then find the nearest line break to do the comparison.
Nikhil
You could jump to the middle of the file but that's not necessarily the middle line. For example, if the first million lines are 100 bytes and the last million 2 bytes. It's not *quite* binary, but still better than a sequential search.
paxdiablo
It should still fit the time constraints for binary searching, though.
dash-tom-bang
+1  A: 

I don't know Javascript, but can if you can do random seeks, you can do a binary search by seeking to the midpoint of your current block (in bytes) and then march forward until you've consumed a newline, as long as you "know" that your key is against a newline.

There will be cases where you need to march backward, though, so you might do your seeks with knowledge of the file buffering so that back-steps are not expensive.

I suppose this could be a bit hairier if you're not dealing with ASCII files.

dash-tom-bang