tags:

views:

46

answers:

2

Hey there,

I'm working on a xml service at the moment , which is a sum of 20+ other xml's from other site's services. So at first it was just ;

GetherDataAndCreateXML();

But obviously getting 20+ other xml , editing and serving it takes time , so i decided to cache it for like 10 minutes and added a final.xml file with a DateTime attribute to check if it's out of date etc. So it became something like ;

var de = DateTime.Parse(x.Element("root").Attribute("DateTime").Value).AddSeconds(10.0d);

if (de >= DateTime.Now)
 return finalXML();
else
{
 RefreshFinalXml();
 return finalXML();
}

The problem now , is that any request after that 10 minute obviously takes too much time as it's waiting for my looong RefreshFinalXml() function. So i did this;

if (ndt >= DateTime.Now)
 return finalXML();
else
{
 ThreadStart start = RefreshFinalXml;
 var thr = new Thread(start);
 thr.IsBackground = true;
 thr.Start();

 return finalXML();
}

This way , even at the 11th minute i simply return the old final.xml but meanwhile i start another thread to refresh current xml at the background. So after something like 13th minute , users get fresh data without any delay. But still there is a problem with this ; it creates a new thread for every single request between 10 to 13th minutes ( while first RefreshFinalXml is still working at the background ) and obviously i can't let that happen , right? And since I don't know much about locking files and detecting if it's lock , i added a little attribute , "Updating" to my final xml ;

if (ndt >= DateTime.Now)
 return finalXML();
else
{
 if (final.Element("root").Attribute("Updating").Value != "True")
  {
   final.Element("root").SetAttributeValue("Updating", "True");
   final.Save(Path);

   ThreadStart start = RefreshFinalXml; 
   //I change Updating Attribute back to False at the end of this function , right before saving Final Xml
   var thr = new Thread(start);
   thr.IsBackground = true;
   thr.Start();
  }
 return finalXML();
}

So , 0-10 minutes = return from cache
10~13 minutes = return from cache while just one thread is refreshing final.xml 13+ minutes = returns from cache

It works and seems decent at the moment , but the question/problem is ; I'm extremely inexperienced in these kind of stuff ( xml services , threading , locks etc ) so i'm not really sure if it'll work flawlessly under tougher situations. For example , will my custom locking create problems under heavy traffic, should i switch to lock file etc.

So I'm looking for any advice/correction about this process , what would be the "best practice" etc.

Thanks in advance
Full Code : http://pastebin.com/UH94S8t6

Also apologies for my English as it's not my mother language and it gets even worse when I'm extremely sleepless/tired as I'm at the moment.

EDIT : Oh I'm really sorry but somehow i forgot to mention a crucial thing ; this is all working on Asp.Net Mvc2. I think i could have done a little better if it wasn't a web application but i think that changes many things right?

+1  A: 

You've got a couple of options here.

Approach #1

First, you can use .NET's asychronous APIs for fetching the data. Assuming you're using HttpWebRequest you'd want to take a look at BeginGetResponse and EndGetResponse, as well as the BeginRead and EndRead methods on the Stream you get back the response.

Example

var request = WebRequest.Create("http://someurl.com");
request.BeginGetResponse(delegate (IAsyncResult ar)
{
    Stream responseStream = request.EndGetResponse(ar).GetResponseStream();
    // use async methods on the stream to process the data -- omitted for brevity
});

Approach #2

Another approach is to use the thread pool to do your work, rather than creating and managing your own threads. This will effectively cap the number of threads you're running, as well as removing the performance hit you'd normally get when you create a new thread.

Now, you're right about not wanting to repeatedly fire updates while you wait for

Example #2

Your code might look something like this:

// We use a dictionary here for efficiency
var Updating = new Dictionary()<TheXMLObjectType, object>;

...

if (de >= DateTime.Now)
{
    return finalXML();
}
else
{
    // Lock the updating dictionary to prevent other threads from
    // updating it before we're done.
    lock (Updating)
    {
        // If the xml is already in the updating dictionary, it's being
        // updated elsewhere, so we don't need to do anything.
        // On the other hand, if it's not already being updated we need
        // to queue RefreshFinalXml, and set the updating flag
        if (!Updating.ContainsKey(xml))
        {
            // Use the thread pool for the work, rather than managing our own
            ThreadPool.QueueUserWorkItem(delegate (Object o)
            {
                RefreshFinalXml();
                lock(Updating)
                {
                    Updating.Remove(xml);
                }
            });

            // Set the xml in the updating dictionary
            Updating[xml] = null;
        }
    }
    return finalXML();
}

Hopefully that's enough for you to work off of.

ShZ
@ShZ: using the ThreadPool for potentially long-running tasks is not recommended in an ASP.NET environment, since ASP.NET itself uses ThreadPool threads to serve incoming requests. This is one of the cases when you should opt for creating your own thread.
Fredrik Mörk
Actually I forgot to mention I'm working on Asp.Net platform when ShZ posted that so that may be why he didn't considered that. Still I was trying it at the moment by putting "Updating" in global etc but maybe it's not the best-practice for this situation ? I'm sorry about not mentionin Asp.Net Shz , Thanks for your post anyway.
Tiax
That's what I get for being a Django guy!
ShZ
A: 

I would go for a different method assuming the following

  • Your service is always running
  • You can afford/are allowed to getting the XML files even if you don't have any request to your service currently.
  • The XML files you fetch are the same files for all your requests. (that is the total number of XML files you need for all your responses are those 20 files)
  • The resulting XML file is not too big to keep in memory all the time

1

First of all I would not store the resulting XML in a file on disk but rather in a static variable.

2

Second I would create a timer set on 10 minutes that updates the cache even if you have no calls to your service. That way you always have quite recent data ready and cached even if your service was not called for a while. It also removes the need to think about if you already have a refresh "ongoing".

3

Third I would consider using threading/async calls to fetch all your 20 XML's in parallel. This is only useful if you want to reduce the refresh time. It could allow you to reduce the refresh interval from 10 to maybe 1-2 minutes if that is improving your service.

I would recommend 1 and 2, but 3 is more optional.

Albin Sunnanbo
Hey Albin , i just edited my original post adding that all these stuff suppose to work on a Mvc2 website. So although you're right about timers and keeping data in memory ; as far as i know ; i can't really do that ( or at least it's not preferable ) in a web application. I'm really sorry about that. Thanks for your post anyway.
Tiax
@Tiax, you can create your own backend caching service that you call with WCF, but that complicates both development and deployment. I guess it's not worth the trouble.
Albin Sunnanbo