views:

532

answers:

8

I am building an ASP.NET web application that will be deployed to a 4-node web farm.

My web application's farm is located in California.

Instead of a database for back-end data, I plan to use a set of web services served from a data center in New York.

I have a page /show-web-service-result.aspx that works like this:

1) User requests page /show-web-service-result.aspx?s=foo

2) Page's codebehind queries a web service that is hosted by the third party in New York.

3) When web service returns, the returned data is formatted and displayed to user in page response.

Does this architecture have potential scalability problems? Suppose I am getting hundreds of unique hits per second, e.g.

/show-web-service-result.aspx?s=foo1

/show-web-service-result.aspx?s=foo2

/show-web-service-result.aspx?s=foo3

etc...

Is it typical for web servers in a farm to be using web services for data instead of database? Any personal experience?

What change should I make to the architecture to improve scalability?

A: 

It's fine. There are some scalability issues. Primarily, with the number of calls you are allowed to make to the external web service per second. Some web services (Yahoo shopping for example) limit how often you can call their service and will lock out your account if you call too often. If you have a large farm and lots of traffic, you might have to throttle your requests.

Also, it's typical in these situations to use an interstitial page that forks off a worker thread to go and do the web service call and redirects to the results page when the call returns. (Think a travel site when you do search, you get an interstitial page while they call out to an external source for the flight data and then you get redirected to a results page when the call completes). This may be unnecessary if your web service call returns quickly.

JP Alioto
+4  A: 

I don't see a problem with this approach, we use it quite a bit where I work. However, here are some things to consider:

Is your page rendering going to be blocked while waiting for the web service to respond? What if the response never comes, i.e. the service is down?

For the first problem I would look into using AJAX to update the page after you get a response back from the web service. You'll also want to consider how to handle the no response or timeout condition.

Finally, you should really think about how you could cache the web service data locally. For example if you are calling a stock quoting service then unless you have a real-time feed, there is no reason to call the web service with every request you get. Store the data locally for a period of time and return that until it becomes stale.

John Asbeck
In terms of performance, looking to caching the data locally is a great suggestion.
Programming Hero
A: 

I recommend you be certain to use WCF, and not the legacy ASMX web services technology as the client. Use "Add Service Reference" instead of "Add Web Reference".

John Saunders
+3  A: 

You have most definitely a scalability problem: the third-party web service. Unless you have a service-level agreement with that service (agreeing on the number of requests that you can submit per second), chances are real that you overload that service with your anticipated load. That you have four nodes yourself doesn't help you then.

So you should a) come up with an agreement with the third party, and b) test what the actual load is that they can take.

In addition, you need to make sure that your framework can use parallel connections for accessing the remote service. Suppose you have a round-trip time of 20ms from California to New York (which would be fairly good), you can not make more than 50 requests over a single TCP connection. Likewise, starting new TCP connections for every request will also kill performance, so you want pooling on these parallel connections.

Martin v. Löwis
You say there is a limit to 50 requests over a single TCP connection. Is this an OS-specific limit? I will be using Windows Server 2008.Also, could you suggest how to use pooling on these connections - any idea how this would be done with a .NET application?
frankadelic
Please try following my maths. *If* you have a round-trip time of 20ms, *then* you can only expect to make 50 requests per second, because 50*20ms == 1000ms == 1s. Wrt. WCF support: it seems WCF itself has little to offer. However people have built on top of it, see e.g. http://weblogs.asp.net/pglavich/archive/2007/05/07/wcf-client-channel-pool-improved-client-performance.aspx
Martin v. Löwis
Ok I see. So this is not a global maximum, it just applies to a particular thread.So if 5 web users request my page which calls web services in the back-end, EACH of those back-end connections would have the max 50 web service requests per second (assuming the 20ms round trip). right?
frankadelic
unless the WS is handling security for you (which it shouldn't be), it would make more sense to multiplex the clients over a not-equivalent number of WS connections. That is, dont hold a single connection to the WS server for each client - if you have a non-negligible number of users, you'll get overloaded pretty quickly (not to mention malicious DoS attacks). However, the problem Martin is referring to is that requests over a single connection are synchronous and sequential. If you have ANY action that takes 20ms, you cant do it more than 50 times in a second, regardless of number of users
AviD
AviD - yes, if my requests are sequential and asynchronous, I would have the 50 request limit per second.However, these backend requests are made in parallel. Each client page request runs in its own thread (right?), so in the backend, an outbound WS request would be done in parallel with all the others. AFAIK, The outbound WS connections will not block each other...
frankadelic
whoops I meant sequential and synchronous
frankadelic
+1  A: 

the trendy answer is REST. Any GET request can be HTTP Response cached (with lots of options on how that is configured) and it will be cached by the internet itself (your ISP, essentially).

ifatree
+1  A: 

Your project has an architecture that reflects they direction that Microsoft and many others in the SOA world want to take us. That said, many people try to avoid this type of real-time risk introduced by the web service.

Your system will have a huge dependency on the web service working in an efficient manner. If it doesn't work, or is slow, people will just see that your page isn't working properly.

At the very least, I would get a web stress tool and performance test your web service to at least the traffic levels you expect to get at peaks, and likely beyond this. When does it break (if ever?), when does it start to slow down? These are good metrics to know.

Other options to look at: perhaps you can get daily batches of data from the web service to a local database and hit the database for your web site. Then, if for some reason the web service is down or slow, you could use the most recently obtained data (if this is feasible for your data).

Overall, it should be doable, but you want to understand and measure the risks, and explore any potential options to minimize those risks.

alchemical
+2  A: 

You may have scalability problems but most of these can be carefully engineered around.

I recommend you use ASP.NET's asynchronous tasks so that the web service is queued up, the thread is released while the request waits for the web service to respond, and then another thread picks up when the web service is done to finish off the request.

MSDN Magazine - Wicked Code - Asynchronous Pages in ASP.NET 2.0

Local caching is an absolute must. The fewer times you have to go from California to New York, the better. You might want to look into Microsoft's Velocity (although that's still in CTP) or NCache, or another distributed cache, so that each of your 4 web servers don't all have to make and cache the same data from the web service - once one server gets it, it should be available to all.

Other things that can go wrong that you should engineer around:

  • The web service is down (obviously) and data falls out of cache, and you can't get it back. Try to make it so that the data is not actually dropped from cache until you're sure you have an update available. Then the only risk is if the service is down and your application pool is reset, so don't reset it as a first-line troubleshooting maneuver!
  • There are two different timeouts on web requests, a connect and an overall timeout. Make sure both are set extremely low and you handle both of them timing out. If the service's DNS goes down, this can look like quite a different failure.
  • Watch perfmon for ASP.NET Queued Requests. This number will rise rapidly if the service goes down and you're not covering it properly.
  • Research and adjust ASP.NET performance registry settings so you have a highly optimized ASP.NET thread pool. I don't remember the specifics, but I seem to remember that there's a limit on IO Completion Ports and something else of that nature that are absurdly low for the powerful hardware I'm assuming you have on hand.
David
A: 

One other issue you need to consider, depending on the type of application and/or data you're pulling down: security.

Specifically, I'm referring to authentication and authorization, both of your end users, and the web application itself. Where are these things handled? All in the web app? by the WS? Or maybe the front-end app is authenticating the users, and flowing the user's identity to the back end WS, allowing that to verify that the user is allowed? How do you verify this? Since many other responders here mention a local data cache on the front end app (an EXCELLENT idea, BTW), this gets even MORE complicated: do you cache data that is allowed to userA, but not for userB? if so, how do you verify that userB cannot access data from the cache? What if the authorization is checked by the WS, how do you cache the permissions then?

On the other hand, how are you verifying that only your web app is allowed to access the WS (and an attacker doesn't directly access your WS data over the Internet, for instance)? For that matter, how do you ensure that your web app contacts the CORRECT WS server, and not a bogus one? And of course I assume that all the connection to the WS is only over TLS/SSL... (but of course also programmatically verify the cert applies to the accessed server...)

In short, its complicated, and many elements to consider here.... but it is NOT insurmountable.

(as far as input validation goes, that's actually NOT an issue, since this should be done by BOTH the front end app AND the back end WS...)


Another aspect here, as mentioned by @Martin, is the need for an SLA on whatever provider/hosting service you have for the NY WS, not just for performance, but also to cover availability. I.e. what happens if the server is inaccessible how quickly they commit to getting it back up, what happens if its down for extended periods of time, etc. That's the only way to legitimately transfer the risk of your availability being controlled by an externality.

AviD