views:

54

answers:

4

I am fetching current data from another company's web feed. It is a simple fetch of an XML file over HTTP. They haven't provided me with much documentation - just a URL.

Because I need to know as soon as possible when the data changes on their site, I need to poll frequently, which isn't a satisfactory solution for either side.

I was about to recommend to them that they set-up some sort of server push - presumably a long-term HTTP connection with asynchronous updates being sent by the server. I am not very familiar with any common protocols for this. It occurred to me that they may already offer this, and I have been too ignorant to realise.

Is there a common web-based protocol for server pushes over HTTP? If there is, is there a quick way I can check if they support it before I make myself look foolish by asking for something that is already available.

(Bonus points for a platform-independent, Python-based solution, but I will take what I can get.)

+1  A: 

I suggest you read this Wikipedia article on the subject. What you want is certainly possible, however it may not be supported by all browsers.

That said... I generally recommend against push technologies on the web, as they sap the resources of a server much faster than a request/response paradigm.

Perhaps there's another way? Polling frequently to see if the file changed is at least a small payload... why is it unsatisfactory for both sides?

Unless you can get the other company to change some of its practices -- perhaps to FTP you the new file, or call a webservice to let your company know that the file has changed -- you may be stuck with polling.

Randolpho
Thank you for your answer. I have several clarifications and comments. (1) There are no browsers involved. (2) It is hard to speak on behalf of another company, but the demand for this interface is low, and the updates are rare enough that it is likely to be cheaper than being frequently polled by me. (3) That said, I am surprised that my polling has a noticeable effect on their system, but their technical people seem concerned by it. I am not approaching a denial-of-service type level, and I suspect their systems are simply poorly set-up to make each HTTP GET quite expensive.
Oddthinking
(4) I may be able to get the company to change its practice - e.g. calling back a web-service, but I wanted to be sure that they weren't already offering what I wanted.
Oddthinking
(5) The linked document was very useful. HTTP Server Push describes what I expected. I have just checked the MIME type, and it is NOT `multipart/x-mixed-replace`.
Oddthinking
So my question seems to have been refined by your explanation to this: How can I tell simply if a server "terminates a connection after response data has been served to a client." Both Python's `urllib2.urlopen()` and the command-line `wget` simply terminate normally after fetching the XML. Does that prove it isn't supported?
Oddthinking
+1  A: 

What you want is HTTP Streaming; read this page. "Comet" is what this technology is commonly called. One implementation is the Ajax Push Engine (APE); the page I just gave you has several others.

Now I don't think it's possible to automatically test if a server supports a push technology because as of now there are no standards on this and the protocols used will vary depending on the implementation.

Alternatively you can use periodic refresh ("polling"), and the advantages of this technique are: you don't need additional software on the server, and this can be done without the cooperation of the server you are polling (it is unfeasible to use Comet if the server you are querying won't install it).

For more information and tricks to reduce bandwidth usage on polling, see this page. Some of these will require some effort from the server you are polling.

NullUserException
I knew Meteor but APE seems to be really great :) thanks for the link man :)
Maskime
How could I tell if HTTP Streaming is already supported by the other web-server?
Oddthinking
@Oddthinking See amended answer.
NullUserException
I guess I have to accept a quick test is not possible. Thanks for the clarification.
Oddthinking
+1  A: 

I'm not aware of any method to test if a web server support a push technology.
You should ask to that company if a Comet approach could be adopted to avoid polling.

For Comet python-based solution, have a look here.

systempuntoout
+1  A: 

To avoid unnecessary download I would check etags and Last-modified headers as described here

http://diveintopython.org/http_web_services/etags.html

Xavier Combelle
Thanks for this suggestion. It would be an improvement on the naive polling I am currently doing. Unfortunately, they do not support the `Last-Modified` nor `If-Modified-Since`. I will include this in the list of options when I approach them, as it may be an easier sell than a full Comet solution.
Oddthinking