views:

63

answers:

3

Let's assume that Stackoverflow offers web services where you can retrieve all the questions asked by a specific user. A request to get all question from user A can result in the following json output:

{
    {
        "question": "What is rest?",
        "date_created": "20/02/2010",
        "votes": 1,
    },
    {
        "question": "Which database to use for ...",
        "date_created": "20/07/2009",
        "votes": 5,
    },
}

If I want to manipulate and present the data in any ways that I want, will it be wise to dump it in a local database? At some point, I will also want to retrieve all answers for each question and store them in a local database.

The workflow that I'm thinking is:

  1. User logs in.
  2. Web services retrieve all questions asked by the logged in user, dump them in a local database.
  3. User wants all answers for a specific question, another web service does the retrieval and dump them in a local database.
  4. After user logs out, delete from the local database all questions and answers from that user.
A: 

I don't see why this would be unwise, so long as the database is isolated, you're taking precautions, and what you're doing doesn't open some other DB up to a SQL Injection attack...

Especially since you're just taking the data and putting it into a DB to manipulate.

However, it may be overkill. It would seem to me you could do the same thing with in-memory DataSets and save additional trips to the DB, but if this works for you I don't see a problem with it.

David Stratton
+1  A: 

I wouldn't do it like this. If a user has asked/answered 5,000 questions, it will make the initial login take forever. If you want to cache, cache per request. It will make writing the web service driver easier too.

Wrap each web service call with your own local function call. Before actually doing the web service call, check the database to see if you have done this call yet. If you have, check the timeout to see if it is expired. if expired, or not set do service call, and store to db.

edit

Some pseudo code. function names are made up:

string get_question(questionId)
{


  SQL = " SELECT data FROM cache 
                       WHERE service='StackOverflow' 
                        AND proceedure='get_question'  
                        AND username='?' 
                        AND parameters = '?' 
                        AND updated > DATEADD(h, ?, GETDATE())";

   // check to see if it exists in db and is not expired
   question = db(SQL, currentUser(), questionId, 2); // single parameter is question id, 2 hour timeout

   // if question is not null, then return the question from the cache.
   if (question != NULL && question != "")
   {
     return question;
   }

   //otherwise do the webservice call to get the data.
   question = WebServiceCall('get_question',questionId);

  // store to database, delete if exists first.
   db("DELETE from cache where service='StackOverflow' AND proceedure='get_question'  AND username='?' AND parameters = '?'", currentUser(), questionId, 2
   db("INSERT INTO cache (service,procedure,parameters,username,data) VALUES(...)");
}
Byron Whitlock
Which technology should I use to cache per request? I could also have the case where 100s of users are login at the same time.
Thierry Lam
+1  A: 

If you implement a smart algorithm your thought can be useful for performance, I think. The point is to determine how much data you should take from service and save to the database. Taking so much data and saving it to db when user logs in, is a bad idea but you can, for example, save half of them in db first and when the other half should be used, you can take and save it.

erasmus