tags:

views:

374

answers:

11

We are all talking about the efficiency of the algorithms and it depends on input size -basically.

How about the system specifications of current computer that runs the algorithm? does it make any difference to run a different sorting algorithm in a Core 2 Duo 2.6 GHZ, 4 GB RAM-computer or in a P-2, 256 MB RAM-computer?

I am sure that there must be a performance difference. But, I want to know what is the real relationship between algorithms and system specifications...

+6  A: 

An increase in hardware performance will give you a constant C times the running time of your algorithm. Meaning if you have computer A which is overall 2 times slower than computer B. Than your algorithm will be twice as fast on computer B. Twice as fast though really makes hardly no difference when you consider big input values to an algorithm though.

In big O notation that is to say you will have something like O(n) compared to C*O(n) = O(c*n) = O(n). The complexity of the algorithm and general running time for large values will be about the same on both Computer A and Computer B.

If you analyze an algorithm's running time using something like big O notation, then you will have a much better idea about how the algorithm really works. Computer performance won't give you any kind of advantage when you are comparing an algorithm that is O(logn) compared to O(n^2).

Take a look at some of the data values for n:

I will assume 1 second per operation for the slow computer, and 2 operations for second for the fast computer. I will compare the better algorithm with the slow computer with the worse algorithm with the fast computer.

for n = 10:

Algorithm 1: O(logn): 4 operations Slow computer: 4 seconds

Algorithm 2: O(n^2): 100 operations Fast computer: 50 seconds

for n = 100:

Algorithm 1: O(logn): 7 operations Slow computer: 7 seconds

Algorithm 2: O(n^2): 10,000 operations Fast computer: 1.4 hours

Large difference

for n = 1,000:

Algorithm 1: O(logn): 10 operations Slow computer: 10 seconds

Algorithm 2: O(n^2): 1,000,000 operations Fast computer: 5.8 days

Huge difference


As n increases, the difference gets bigger and bigger.

Now if you tried to run each of these algorithms on a faster/slower computer for a large input size. It wouldn't matter. Hands down the O(logn) would be faster.

Brian R. Bondy
On the other hand, if you look at performance / dollar, and factor in the cost of re-engineering code to be faster, it may make sense to just solve it through hardware.
Joeri Sebrechts
no hardware can make up for a good/bad algorithm if you are dealing with large data sets.
Brian R. Bondy
+1  A: 

Yes, it does depend on system specification. One system might be 10 times faster than another, so it will run bubblesort and quicksort on a set of data 10 times faster than the other.

But when you do analysis of algorithms, you often ignore constant factors like that, which is one thing that big-O notation does. So bubblesort is O(n^2) and quicksort is O(nlogn) (in the average case), and that holds no matter how fast your hardware is.

The interesting thing is when you start comparing apples and oranges. If you're running bubblesort on your fast hardware, you may find it's faster than quicksort on the slow hardware -- but only up to a point. Eventually, with a large enough input set, the quicksort on the slow hardware is going to be faster than bubblesort on the fast hardware.

If you want to start making comparisons like that, you need to do two things together: determine algorithmic complexity including the constant factors, and develop a speed model (e.g. how many iterations of a particular loop it can perform per second) for the actual hardware you're running on. One of the interesting things about Knuth's Art of Computer Programming, compared with other books on algorithms, is that he does both, so that for each algorithm he examines, he calculates how many units of execution time it will take for a given size of input on his (mythical) MIX computer. You could then adjust the calculation for faster or slower hardware -- something that big-O notation doesn't help with.

TimB
A: 

The efficiency of an algorithm doesn't depend on the system specification. The efficiency is described by the Ordo number, which gives you a relation of the processing effort and the size of the input.

Czimi
A: 

Certainly yes. With high CPU the execution time will reduce.
Similarly with higher memory, the time taken to swap data (if applicable) will definitely reduce.

Nrj
A: 

By your question, do you mean to ask why the efficiency of an algorithm is described only in terms of the input size?

Algorithms are usually described using the Big O Notation. This notation describes the asymptotic behavior of an algorithm; it describes the behavior when the input data is very very large.

So for example, we have two algorithms for sorting.

  1. Algo#1 with O(n)
  2. Algo#2 with O(n^2)

And let's take two PCs:

  1. PC1
  2. PC2 100x faster than PC1

And we have two setups:

  1. PC1 running Algo#1
  2. PC2 running Algo#2

When n is very very large (like billions?) PC1 will still beat PC1 :)

moogs
A: 

Be aware of the particularities of the language in which you implement your algorithm.

For instance, in java world, As illustrated in this article, a faster computer does not always means a faster runtime:

Same Java program on a single CPU machine can actually run a lot faster than on a multiprocess/multi-core machine!!

VonC
So, according to that article can we say that JVM does not really depend on the system specs?!
israkir
I think it does, but not all system specs matter the same for a JVM. Multi-core CPU is still not fully exploited by a JVM..
VonC
A: 

Does it make any difference to run a different sorting algorithm in a Core 2 Duo 2.6 GHZ, 4 GB RAM-computer or in a P-2, 256 MB RAM-computer?

In some cases absolutely! If your data set does not fit into memory you will need to use a disk based sorting algorithm such as merge sort. Quoting from Wikipedia:

When the size of the array to be sorted approaches or exceeds the available primary memory, so that (much slower) disk or swap space must be employed, the memory usage pattern of a sorting algorithm becomes important, and an algorithm that might have been fairly efficient when the array fit easily in RAM may become impractical. In this scenario, the total number of comparisons becomes (relatively) less important, and the number of times sections of memory must be copied or swapped to and from the disk can dominate the performance characteristics of an algorithm. Thus, the number of passes and the localization of comparisons can be more important than the raw number of comparisons, since comparisons of nearby elements to one another happen at system bus speed (or, with caching, even at CPU speed), which, compared to disk speed, is virtually instantaneous.

For example, the popular recursive quicksort algorithm provides quite reasonable performance with adequate RAM, but due to the recursive way that it copies portions of the array it becomes much less practical when the array does not fit in RAM, because it may cause a number of slow copy or move operations to and from disk. In that scenario, another algorithm may be preferable even if it requires more total comparisons.

Don Neufeld
+4  A: 

I don't like the answers provided by Brian Bondy and Czimi...

Perhaps this is because I started in a different era, when 32K was considered a lot of memory, and most "personal computers" had 8K bytes, and that now I work in scientific computing where the largest data sets are processed on some of the world's largest systems with thousands of processing nodes and seemingly unbelievable quantities of storage. Therefore I don't overlook certain other elements of the question.

The size of the data set in question makes a fantastic difference. Most all the answers on this question so far ignore this and work for very small numbers N. The other people who have answered have all presumed "it all fits in memory," or something close to that.

For large data sets other factors come into play, and "large" depends on what resources you have to use in solving your problem. Modern systems have the opportunity for off-line storage (e.g. DVDs), networked storage (e.g. nfs), on-line storage (e.g. serial ATA), and two levels of memory storage, system main memory and on-chip cache. How these are leveraged matters and the larger the data set the more they matter. You may or may not need to design access to these into your "algorithm", but if you do, it really matters!

As you increase scale beyond some particular point - the limit of a single CPU and its local memory is about right - these other factors become an increasingly large factor in the overhead of the workload. When I was a Digital, we did some of the first real commercial work on multi-CPU systems and I remember running a benchmark that showed that using a single-CPU as one "unit" of CPU workload capability, a second CPU (in a tightly coupled system) would give you a total of about 1.8. That is, the second CPU added about 0.8. For three, the increase dropped to about 0.6, and four it dropped a lot more, to about 0.2, for a grand total of about 2.6 for a four CPU arrangement, though we had some troubles keeping good numbers with four CPUs due to other effects (the measurement effort became a large fraction of the additional resource). ...The bottom line was that multi-CPUs weren't necessarily all they were cracked up to be - four times the CPU does NOT give you four times the processing power, even though in theory you get four times the flops. ...We repeated the work on the Alpha chip, the first multi-core in history, and the results held up pretty well. Surely there could have been optimizations to improve the fraction each additional CPU gave, and surely there has been a lot of work since then to split computing threads more smartly, but you'll never get it all the way to 100% of each new one, in part because they all slow down some (extra overhead) to coordinate.

Small interjection - we had a saying about this work: "Religate all the Important Stuff to the Compiler!" RISC, get it? This was because the compiler itself had to organize the workload so competing threads didn't step on one another!

Ultimately performing processing of really massive data crunching requires a really smart strategy of moving the data in and out of farther afield data storage through to local memory. And, division of labor within the algorithm is absolutely vital. In work I was doing with Roberto Mechoso at UCLA doing Global Circulation Modeling, they had a data-broker design that is illustrative of the attempts people make to do a great job. Frankly, the result wasn't as good as it could have been, but the design ideas that went into it are worth study. ...Presuming you consider this part of your "algorithm" - and not just the bit twiddling part, then the algorithms management of resources is one of the most vital aspects of reasonable if not optimal resource utilization doing substantial computing.

...I hope this helps answer your inquiry.

Richard T
It is good to know your interesting experience. Thanks Richard T;)
israkir
I don't like your answer either ;) It's too long.
Brian R. Bondy
A: 

Yes. Remember, we have jumped ~4 orders of magnitude since the early 80s. (1 MHz, 10 MHz, 100 MHz, 1000MHz). But that's only the difference between n=10 and n=10000, in terms of data set sizes. I can purchase a terabyte hard drive...over 6?7? orders of magnitude than my old 20 megabyte drive.

There's a lot more data floating around out there than there is compute power. So while you might be confused about how useful the big-O is at n=50, n=500 kind of sizes...when n=1,000,00 you want to minimize n as much as you can. Anything supralinear is just rough on your compute power...non-polynomial is even worse. This extends all the way from the top of the system to the bottom. So, efficiency is king as soon as you deal with real-world dataset sizes.

Let me give you an example.

I did a junior level database design. By the end I had maybe 5 tables with maybe 20-40 pre-defined categories in them. Added rows, 10, 20 rows. No big deal. I was a whiz. I did it in PHP, not Perl. I was all that and a bag o' chips.

Move to now, a few years later. I'm doing a hobby project in datamining the stock market. I harvest data off a financial site every day - 6100 stocks, with about 10 columns in each stock. Thirty-thousand+ rows per week. My initial design was "normalized", with static data and dynamic data in different tables. As I played around with my queries, learning about things, if I did a bad join, I'd literally crash my server and make it unavailable. So I denormalized. My next phase is tagging and starting actual mining. I don't plan to start making serious predictions until Christmas-time; roughly 11x30K = 330K rows to mine and analyze. Algorithm efficiency will matter, if I want to get my data processed in a timely fashion. Doesn't matter if my CPU was 10 times as fast...if I use a N^2 algorithm, it'd only get done 2x as fast. :-)

Paul Nathan
+2  A: 

One thing not raised so far is that alogorithms are often described in terms of speed, e.g. O(n), O(n log(n)), etc... but they also have characteristics in terms of resource usage, where improved speed, say O(n) versus O (n log(n)), is at the cost of much greater memory usage. In modern computers as resources become exhausted, they are typically replaced with larger slower resources, e.g. swapping memory for disk, where the slower resource is orders of magnitude slower. Thus when we graph the performance of our algorithm against time, and expect a straight line, n log n curve, etc... we often see spikes for large values of n as memory gets exhuasted. In this case, the difference between 1GB and 2GB of RAM can be huge, so in practical terms, the answer to your question is yes, System specification is very important, and selection of algorithms requires knowledge of the system specification and the size of the input data.

For example, I develeop surface modelling and analysis software, and I know that my programs work well on a 32bit XP box for TIN models of 4 million points. The performance difference between 3.5 million and 4 million points is minor. At 4.5 million points the performance degradation is so severe the software is unusable.

Shane MacLaughlin
A: 

But, I want to know what is the real relationship between algorithms and system specifications...

I am confused. I see a lot of people here writing a lot of things, however, when I read the quote above as a question, here's all I can say about it:

  • A better system (= faster CPU, more RAM) runs faster, a worse one (= slower CPU, less RAM) runs slower. The same algorithm (no matter how good or bad it is) will most likely run faster on a better system and slower on the worse one.

  • A faster algorithm runs faster than a slower one. It will run faster on the slower system and it will run faster on the faster system.

So what exactly was your question again? Is your question "Do we really need a fast algorithm if the system is already that fast? Won't a slow one do as well?" Yes, maybe. But in that case I would ask two questions:

  1. Why selecting a slow algorithm just because the system is fast? That way your code will only run at decent speed on a very fast system. If you choose a fast algorithm, your code might even run at decent speed on a much worse system.

  2. Why trying to intentionally achieve worse performance? Even though a bad algorithm might run within five seconds, what you consider fast enough on the fast machine, a good one might run in 100 milliseconds. So why making your program perform a task in 5 seconds it could perform exactly the same one in 100 milliseconds?

Actually it's point number (2) that really bugs me quite often. So often people say "Hey, don't over optimize, it won't really matter. This code is only such a small system in such a big system". Yes, if you just look at this code isolated, that is true. But there is a saying "Many a mickle makes a muckle". Of course you should optimize the most processor intensive parts first. However, if a system consists out of 100 modules and each of them uses only one percent of the CPU time, optimizing one of them to be twice as fast will only get an overall processing time improvement of 0.5%, close to nothing; that's why most people refrain from doing that. But what people overlook is that optimizing all of them to be twice as fast will get a processing time improvement of 50% (or IOW, the app will run twice as fast as a whole).

So unless there is any good reason for not doing it, why not always using the best algorithm known to solve a problem? The best means the one that shows good performance and a good CPU time/memory usage ratio (as it's useless to take the fastest one if this one needs more memory than a normal customer PC can even take).

Mecki