How to search fast in a non-sorted array? I am unable to think of any other search mechanism apart from linear search.
Any pointers will be helpful.
How to search fast in a non-sorted array? I am unable to think of any other search mechanism apart from linear search.
Any pointers will be helpful.
Yes, without sorting the data a linear search is about as good as you can do.
Given a truly random non-sorted array, linear search is the best way.
You might be able to do better with some kind of heuristic approach if the data could be assumed to be roughly sorted.
There really isn't any kind of optimization to be done without your array being sorted. All optimizations hinge on having some kind of knowledge about your array being guaranteed. Without that, you can't assume anything and have to go through each item linearly.
Once your code is presented with an unsorted list, yes you will need to do a linear search.
There are a couple of things you can do however if it is liable to be searched more than once.
First off, of course, would be sorting the list. It would make the first lookup take quite a while, but subsequent ones would be faster.
Secondly, I'd suggest caching the result. For most usages I've found subsequent lookups have a high likelyhood to be either the same value, or one nearby.
You shouldn't bother with either of these though, unless the list is really large and slow to search. In general, the less code you can get away with writing, the better off everyone is.
If you are able, you might consider changing the way the data is stored/created in the first place. Rather than adding items to an array, create a full blown binary tree (or whatever) and work with that. It makes manipulating the data slightly slower but it shouldn't be as slow as re-sorting the data every time and would make your searches much faster.
As others have mentionned, a linear search is the way to go. However, depending upon your setup, the following may or may not be relevant to speeding things up.
If some items are searched for more often than others, a small cache might be useful. For example, if you are implementing a lookup() operation for a file system (where the data is ultimately stored on disk somewhere), a cache can help immensely.
Can you sort/sift the data at all so that the most commonly looked for items are near the start of the array?
Is it practical to store your data in two sets? One is sorted, one is unsorted? The sorted, you should be able to search quickly for your item. If not found, go to your unsorted list and search linearly through that. Why? Although you said it is not practical to insert after every insertion, is it practical/efficient to insert from the unsorted list into the sorted list periodically?
Must it all be in one unsorted array? Can you put your data into blocks of sorted data? That is, you might 1000 sorted entries per block. Searching each sorted block might be faster than searching the entire unsorted list.
Just some ideas. They might not be applicable to your situation, but hopefully they are helpful.
There are some clever approaches for optimizing linear search, but in the end none of them really pay of in terms of performance but make the code error-prone and hard to understand. So I'd recommend sticking to a simple linear search (i.e. simple iterating + comparing).