searching and sorting | ansaurus

tags:

search

views:

57

answers:

2

Q:

searching and sorting

If the list has 1024 items (lg1024 = 10) at what point (the number of searches) does sorting the list first and using binary search pay off? How does your answer change if the list has 2048 items? instead of using sequential search

+1 A:

if your list is unsorted it will take O(n) to find it. Sort with quicksort costs O(n*log n), then binary search is O(log n). Lets assume that x is number of searchs. x * n = x * logn + n * logn . by putting different values you can estimate the dynamics. my rough estimate tells that if n = 1024 and number searches is greater then ~10, it is more efficitent to sort first. put 1024 instead of n and try.

Andrey 2010-03-01 17:01:41

I think the equation should be: x * n = n * logn + x * logn

2010-03-01 19:49:46

I get ~10 when n=1024 and ~11 when n=1024.

2010-03-01 19:53:03

I solved the equation for x: x = (n * logn) / (n - logn)

2010-03-01 19:54:28

+1 A:

Where the "linear access" curve crosses the "binary search" curve depends on how long it takes to access/insert a single item versus how many items there are. This will be different for every combination of compiler, memory and cpu architecture, type of data/node in the list, the distribution of data values, what sort and insertion algorithms you use, etc... But with a "large enough" set of items, the running time can be described by mentioning how its upper bound grows with increasing number of items, even though that "Big-O" bound may not precisely describe any particular run.

You can figure out precisely if you can know the specific algorithm you will insert or search with, and determine the actual instructions that make up your list accesses, and find out how many clock cycles they take to execute, etc etc...

Then you can say for sure which one is faster, and at which point. And if you know you data values, you can model it. But if you don't know, you have to assume (for example, what if your inserted data values are already ordered? how does that affect your sort or insertion function?)

For example, a single item retrieval may take 1us. Comparing two items may take 0.5us. Doing a sorted list insertion with 100 items in the list might require X number of retrievals, Y number of compares, and Z number of updates/writes.... Whereas an unordered list might require more or less depending on what's already there and what you're inserting.

Joe Koberg 2010-03-01 17:05:33

related questions

Best way to search data stored as XML in Sql Server?

What are the alternative's to using the iThenticate service for content comparison?

Search by hash?

Free text search integrated with code coverage

How-to: Ranking Search Results

Find item in WPF ComboBox

Find in Files: Search all code in Team Foundation Server

Searching for phone numbers in mysql

How do I implement Search Functionality in a website?

Can you perform an AND search of keywords using FREETEXT() on SQL Server 2005?

How do I search content, within audio files/streams?

Search Plugin for Safari

Search strategies in ORMs

Using Lucene to search for email addresses

SQL Server Full Text Searching

How do you do a case insensitive search using a pattern modifier using less ?

WildcardQuery error in Solr

PowerShell FINDSTR eqivalent?

Parsing search queries in Java

Need Pattern for dynamic search of multiple sql tables

grep a file, but show several surrounding lines?

Eclipse : Class file name must end with .class exception in Java Search

MOSS SSP problem - Failed database logons from deleted SSP

Incomplete results with Turkish characters in Indexing Service

Lucene Score results