views:

93

answers:

6

if I have a large javascript string array that has over 10,000 elements, how do I quickly search through it?

Right now I have a javascript string array that stores the description of a job, and I"m allowing the user to dynamic filter the returned list as they type into an input box.

So say I have an string array like so:
var descArr = {"flipping burgers", "pumping gas", "delivering mail"};

and the user wants to search for: "p"

How would I be able to search a string array that has 10000+ descriptions in it quickly? Obviously I can't sort the description array since they're descriptions, so binary search is out. And since the user can search by "p" or "pi" or any combination of letters, this partial search means that I can't use associative arrays (i.e. searchDescArray["pumping gas"] ) to speed up the search.

Any ideas anyone?

A: 

This may not be an answer for you, as I'm making some assumptions about your setup, but if you have server side code and a database, you'd be far better off making an AJAX call back to get the cut down list of results, and using a database to do the filtering (as they're very good at this sort of thing).

As well as the database benefit, you'd also benefit from not outputting this much data (10000 variables) to a web based front end - if you only return those you require, then you'll save a fair bit of bandwidth.

Paddy
A database would need a fulltext index to be suitable for this job, that is not something databases implement by default, as it costs a lot of storage/memory. It still may be faster to use a normal database simply because it executes code significantly faster than the worst case browser IE6, but if it has to handle a lot of users then it must a specialized index.
eBusiness
A: 

I suggest trying a ready made JS function, for example the autocomplete from jQuery. It's fast and it has many options to configure.

Check out the jQuery autocomplete demo

medopal
Hi medopal, thanks for the suggestion but JQuery autocomplete is only fast when there are relatively few entries, when in the order of 10K+ it becomes slow as well
TriFu
A: 

What you want is basically the equivalent of fulltext index. Javascript running inside of a browser might get slow if your array spans over 10000 entries.

As Paddy correctly pointed out, you are better off keeping the string list in a database and feeding the server with a search string via AJAX.

For example, you could define a 2-column MySQL table (primary key, string), create a fulltext index on the string column and use jQuery to search the database for matches. I guess that would pretty much solve your performance issues in case using a server-side database is an option.

Edit: Although Google Gears is no longer being developed, it can help you with extending the default functionality of Javascript by a relational database that supports full-text search. The only downside is that the user has to install Google Gears.

Saul
@Saul, currently I want the user to be able to filter the list as they type. So as they type another letter, the list would further filter until only a few results are left. Do you think the AJAX call time + Server Response Time would be less quicker than 15 seconds? because currently that's how long it's taking me to search through 10K+ elements on client side.
TriFu
@TriFu: Last time I checked, running a LIKE '%str%' query against a table of ~15000 entries and a fulltext index took less than 0.1 seconds. With AJAX overhead, the response roundtrip time would be a maximum of 3 seconds in most cases, depending on the size of the result set and connection bandwidth.
Saul
15 seconds is too much. "Google Suggest" uses an Ajax call and returns 10 results each time. It's very fast and of course depends on your server and bandwidth.
medopal
+1  A: 

As regular expression engines in actual browsers are going nuts in terms of speed, how about doing it that way? Instead of an array pass a gigantic string and separate the words with an identifer. Example:

  • String "flipping burgers""pumping gas""delivering mail"
  • Regex: "([^"]*ping[^"]*)"

With the switch /g for global you get all the matches. Make sure the user does not search for your string separator.

You can even add an id into the string with something like:

  • String "11 flipping burgers""12 pumping gas""13 delivering mail"
  • Regex: "(\d+) ([^"]*ping[^"]*)"

  • Example: http://jsfiddle.net/RnabN/4/ (30000 strings, limit results to 100)

sod
The performance is not so much about modern browsers as it is about hardware and user habits. In real life, 2GB of RAM for the computer of an average user gives a different result when compared to a machine of a seasoned developer. IT people keep their computers in good shape.
Saul
Modern regular expression runtimes are nearly as fast as a precompiled c++ program. There are performance worlds between old javascript (firefox 2/netscape/internet explorer) and the new just-in-time implementations. JavaScript on an average pc with chrome runs multiple times faster then JavaScript on a highend pc with internet explorer.
sod
A: 

There's no way to speed up an initial array lookup without making some changes. You can speed up consequtive lookups by caching results and mapping them to patterns dynamically.

1.) Adjust your data format. This makes initial lookups somewhat speedier. Basically, you precache.

var data = {
    a : ['Ant farm', 'Ant massage parlor'],
    b : ['Bat farm', 'Bat massage parlor']
    // etc
}

2.) Setup cache mechanics.

var searchFor = function(str, list, caseSensitive, reduce){
    str = str.replace(/(?:^\s*|\s*$)/g, ''); // trim whitespace
    var found = [];
    var reg = new RegExp('^\\s?'+str, 'g' + caseSensitive ? '':'i');
    var i = list.length;
    while(i--){
        if(reg.test(list[i])) found.push(list[i]);
        reduce && list.splice(i, 1);
    }
}

var lookUp = function(str, caseSensitive){
    str = str.replace(/(?:^\s*|\s*$)/g, ''); // trim whitespace
    if(data[str]) return cache[str];
    var firstChar = caseSensitive ? str[0] : str[0].toLowerCase();
    var list = data[firstChar];
    if(!list) return (data[str] = []);
    // we cache on data since it's already a caching object.
    return (data[str] = searchFor(str, list, caseSensitive)); 
}

3.) Use the following script to create a precache object. I suggest you run this once and use JSON.stringify to create a static cache object. (or do this on the backend)

// we need lookUp function from above, this might take a while
var preCache = function(arr){
    var chars = "abcdefghijklmnopqrstuvwxyz".split('');
    var cache = {};
    var i = chars.length;
    while(i--){
        // reduce is true, so we're destroying the original list here.
        cache[chars[i]] = searchFor(chars[i], arr, false, true);
    }
    return cache;
}

Probably a bit more code then you expected, but optimalisation and performance doesn't come for free.

BGerrissen
A: 

I can't reproduce the problem, I created a naive implementation, and most browsers do the search across 10000 15 char strings in a single digit number of milliseconds. I can't test in IE6, but I wouldn't believe it to more than 100 times slower than the fastest browsers, which would still be virtually instant.

Try it yourself: http://ebusiness.hopto.org/test/stacktest8.htm (Note that the creation time is not relevant to the issue, that is just there to get some data to work on.)

One thing you could do wrong is trying to render all results, that would be quite a huge job when the user has only entered a single letter, or a common letter combination.

eBusiness