views:

1405

answers:

15

I have heard a few developers recently say that they are simply polling stuff (db, files etc) to determine when something has changed and then run a task, such as an import.

I'm really against this idea and feel that utilising available technology such as Remoting, WCF etc would be far better than polling.

However, I'd like to identify the reasons why other people prefer one approach over the other and more importantly how I can convince others that polling is wrong in this day and age.

Thanks.

+18  A: 

Polling is not "wrong" as such.

A lot depends on how it is implemented and for what purpose. If you really care about immedatly notification of a change, it is very efficient. Your code sits in tight loop, constantly polling (asking) a resource whether it has changed / updated. This means you are notified as soon as you can be that something is different. But, your code is not doing anything else and there is overhead in terms of many many calls to the object in question.

If you are less concerned with immediate notification you can increase the interval between polls, and this can also work well, but picking the correct interval can be difficult. Too long and you might miss critical changes, too short and you are back to the problems of the first method.

Alternatives, such as interrupts or messages etc. can provide a better compromise in these situations. You are notified of a change as soon as is practically possible, but this delay is not something you control, it depends on the componant tself being timely about passing on changes in state.

What is "wrong" with polling?

  • It can be resource hogging.
  • It can be limiting (especially if you have many things you want to know about / poll).
  • It can be overkill.

But...

  • It is not wrong inherantly.
  • It can be very effective.
  • It is very simple.
xan
"This means you are notified as soon as you can be that something is different." Wouldn't that kind of depend on the polling interval and how close you happen to be to when whatever happens?
Svish
There are certain situations where polling can be more efficient than messages. It depends on number and frequency of "events". As a somewhat contrived example, consider a thermometer that measures some temperature 100 times a second and send an update. If you use separate message for each update, you need to handle 100 messages per second. But suppose you only care to measure the temperature once every 5 seconds. In such a case polling once each 5 seconds will be more efficient than handling 500 messages.
Gregory Mostizky
+6  A: 

Polling is easy to do, very easy, its as easy as any procedural code. Not polling means you enter the world of Asynchronous programming, which isn't as brain-dead easy, and might even become challenging at times.

And as with everything in any system, the path of less resistance is normally more commonly taken, so there will always be programmers using polling, even great programmers, because sometimes there is no need to complicate things with asynchronous patterns.

I for one always thrive to avoid polling, but sometimes I do polling anyways, especially when the actual gains of asynchronous handling aren't that great, such as when acting against some small local data (of course you get a bit faster, but users won't notice the difference in a case like this). So there is room for both methodologies IMHO.

Robert Gould
+1 for common sense: "sometimes there is no need to complicate things with asynchronous patterns."
JeffK
+1  A: 

This is not answering your question. But realistically, especially in this "day and age" where processor cycles are cheap, and bandwidth is large, polling is actually a pretty good solution for some tasks.

The benefits are:

  • Cheap
  • Reliable
  • Testable
  • Flexible
Ali A
I'm glad you qualified that with "some tasks" because the idea that we have endless processing power, space, or bandwidth is inherently floored. Eventually we will hit the limits of how fast processors can be, how much you can store in a 3.5in HDD, and how much data you can squeeze down an internet connection. This kind of thinking can't scale.
+2  A: 

If you are polling for changes to a file, then I agree that you should use the filesystem notifications that are available for when this happens, which are available in most operating systems now.

In a database you could trigger on update/insert and then call your external code to do something. However it might just be that you don't have a requirement for instant actions. For instance you might only need to get data from Database A to Database B on a different network within 15 minutes. Database B might not be accessible from Database A, so you end up doing the polling from, or as a standalone program running near, Database B.

Also, Polling is a very simple thing to program. It is often a first step implementation done when time constraints are short, and because it works well enough, it remains.

JeeBee
+16  A: 

There are two reasons why polling could be considered bad by principle.

  1. It is a waste of resources. It is very likely that you will check for a change while no change has occurred. The CPU cycles/bandwidth spend on this action does not result in a change and thus could have been better spend on something else.

  2. Polling is done on a certain interval. This means that you won’t know that a change has occurred until the next time that the interval has passed.

It would be better to be notified of changes. This way you’re not polling for changes that haven’t occurred and you’ll know of a change as soon as you receive the notification.

hopla
Note that sometimes polling saves resources. It sets an *upper* limit on how often you do the heavy lifting, namely the polling interval. Not often relevant, but when it is relevant it might be important.
Steve Jessop
... of course you can achieve the same thing if you're being notified, by holding off the work for a duration and rolling up multiple notifications in that time. But it's lots more code.
Steve Jessop
Of course, most things that provide a notification of change are themselves polling in order to detect that change. At some point, in many situations, the polling comes first.
Jeff Yates
+1  A: 

The thing about polling is that it works! Its reliable and simple to implement.

The costs of pooling can be high -- if you are scanning a database for changes every minute when there are only two changes a day you are consuming a lot of resources for a very small result.

However the problem with any notification technoligy is that they are much more complex to implement and not only can they be unreliable but (and this is a big BUT) you cannot easily tell when they are not working.

So if you do drop polling for some other technoligy make sure it is usable by average programmers and is ultra reliable.

James Anderson
+1  A: 

Its simple - polling is bad - inefficient, waste of resources, etc. There is always some form of connectivity in place that is monitoring for an event of some sort anyway, even if 'polling' is not chosen.

So why go the extra mile and put additional polling in place.

Callbacks are the best option - just need to worry about tie the callback in with your current process. Underlying, there is polling going on to see that the connection is still in place anyhow.

If you keep phoning/ringing your girlfriend and shes never answers, then why keep calling? Just leave a message, and wait until she 'calls back' ;)

But what if there's an emergency and you HAVE to find out something from you girlfriend asap? What if she doesn't check her answerphone very often? In this situation you would be relying on her to get back to you, and you have no control over that. I take your point, just giving a counter-example.
xan
You could put controls in place to ensure the remote object is always in a valid state, or report problems that cant be sorted itself. Or continuing the analogy, have the answerphone continously beep so the gfriend notices it straight away, or have it send a txt to warn others she may be in trouble.
HAdes
Great analogy there.
+1  A: 

I agree that avoiding polling is a good policy. However, In reference to Robert's post, I would say that the simplicity of polling can make it a better approach in instances where the issues mentioned here are not such a big problem, as the asynchronous approach is often considerably less readable and harder to maintain, not to mention the bugs that can creep in to its implementation.

kronoz
+3  A: 

I use polling occasionally for certain situations (eg, in a game, I would poll the keyboard state every frame), but never in a loop that ONLY does polling, rather I would do polling as a check (has resource X changed? If yes, do something, otherwise process something else and check again later). Generally speaking though, I avoid polling in favor of asynchronous notifications.

The reasons being that I do not spend resources (cpu time, whatever) waiting for something to happen (especially if those resources could speed up that thing happening in the first place). The cases where I use polling, I don't sit idle waiting, I use the resources elsewhere, so its a non-issue (for me, at least).

Dan
+1  A: 

As with everything, it depends. A large high-transaction system I work on currently uses a notification with SQL (A DLL loaded within SQL Server that is called by an extended SP from triggers on certain tables. The DLL then notifies other apps that there is work to do).

However we're moving away from this because we can practically guarantee that there will be work to do continuously. Therefore in order to reduce the complexity and actually speed things up a bit, the apps will process their work and immediately poll the DB again for new work. Should there be none it'll try again after a small interval.

This seems to work quicker and is much simpler. However, another part of the application which is much lower volume does not benefit from a speed increase using this method - unless the polling interval is very small, which leads to performance problems. So we're leaving it as is for this part. Therefore it's a good thing when it's appropriate, but everybody's needs are different.

Valerion
+8  A: 

Examples of things which use polling in this day and age:

  • Email clients poll for new messages (even with IMAP).
  • RSS readers poll for changes to feeds.
  • Search engines poll for changes to the pages they index.
  • StackOverflow users poll for new questions, by hitting 'refresh' ;-)
  • Bittorrent clients poll the tracker (and each other, I think, with DHT) for changes in the swarm.
  • Spinlocks on multi-core systems can be the most efficient synchronisation between cores, in cases where the delay is too short for there to be time to schedule another thread on this core, before the other core does whatever we're waiting for.

Sometimes there simply isn't any way to get asynchronous notifications: for example to replace RSS with a push system, the server would have to know about everyone who reads the feed and have a way of contacting them. This is a mailing list - precisely one of the things RSS was designed to avoid. Hence the fact that most of my examples are network apps, where this is most likely to be an issue.

Other times, polling is cheap enough to work even where there is async notification.

For a local file, notification of changes is likely to be the better option in principle. For example, you might (might) prevent the disk spinning down if you're forever poking it, although then again the OS might cache. And if you're polling every second on a file which only changes once an hour, you might be needlessly occupying 0.001% (or whatever) of your machine's processing power. This sounds tiny, but what happens when there are 100,000 files you need to poll?

In practice, though, the overhead is likely to be negligible whichever you do, making it hard to get excited about changing code that currently works. Best thing is to watch out for specific problems that polling causes on the system you want to change - if you find any then raise those rather than trying to make a general argument against all polling. If you don't find any, then you can't fix what isn't broken...

Steve Jessop
+5  A: 

I think people should realize that in most cases, at some level there is polling being done, even in event or interrupt driven situations, but you're isolated from the actual code doing the polling. Really, this is the most desirable situation ... isolate yourself from the implementaion, and just deal with the event. Even if you must implement the polling yourself, write the code so that it's isolated, and the results are dealt with independently of the implementation.

Nerdfest
This is not always accurate. In the case where an objects properties change, the object itself can setup a notification that occurs when the property is set. In that case polling wouldn't necessarily occur, though it certainly could in the case the object actually checks for a difference between old and new values.
A: 

I see many answers here, but I think the simplest answer is the answer it self:

Because is (usually) much more simple to code a polling loop than to make the infrastructure for callbacks.

Then, you get simpler code which if it turns out to be a bottleneck later can be easily understood and redesigned/refactored into something else.

Ismael C
+3  A: 

Client polling doesn't scale as well as server notifications. Imagine thousands of clients asking the server "any new data?" every 5 seconds. Now imagine the server keeping a list of clients to notify of new data. Server notification scales better.

Slidell4life
Excelent point there, I find in modern times that programmers are sloppy with processor, space, and bandwidth issues, because they have this funny idea that those things will infinitely improve, but eventually we will reach the point where it is impossible to improve those things anymore. Asynchronous notification isn't the devil people make it out to be, anyone programming a GUI on a modern OS deals with asynchronous notifications all the time.
A: 

Here is a good summary of relative merits of push and pull: https://stpeter.im/index.php/2007/12/14/push-and-pull-in-application-architectures/

I wish I could summarize it further into this answer but some things are best left unabridged.

bagheera