views:

37

answers:

1

I've searched through the site and haven't found a question/answer that quite answer my question, the closest one I found was: Syncing objects between two disparate systems best approach.

Anyway to begun, because there is no RSS feeds available, I'm screen scrapping a webpage, hence it does a fetch then it goes through the webpage to scrap out all of the information that I'm interested in and dumps that information into a sqlite database so that I can query the information at my leisure without doing repeat fetching from the website.

However I'm also storing various metadata on the data itself that is stored in the sqlite db, such as: have I looked at the data, is the data new/old, bookmark to a chunk of data (Think of it as a collection of unrelated data, and the bookmark is just a pointer to where I am in processing/reading of the said data).

So right now my current problem is trying to figure out how to update the local sqlite database with new data and/or changed data from the website in a manner that is effective and straightforward.

Here's my current idea:

  1. Download the page itself
  2. Create a temporary table for the parsed data to go into
  3. Do a comparison between the official and the temporary table and copy updates and/or new information to the official table

This process seems kind of complicated because I would have to figure out how to determine if the data in the temporary table is new, updated, or unchanged. So I am wondering if there isn't a better approach or if anyone has any suggestion on how to architecture/structure such system?

Edit 1: I'm not sure where to put the additional information, in an comment or as an edit, so I'm going to add it here.

This expands a bit on the metadata in regards of bookmarking, basically the data source can create new data/addition to the current data, so one reason why I was thinking of doing the temporary table idea was so that I would be able to determine if an data source that has been "bookmarked" has any new data or not.

A: 

Is it really important to determine if the data in the temporary table is new, updated or unchanged? Do you really need to keep an history of the changes?

NO: don't use the temporary table but just mark as old (timestamp) your old records, don't do updates, and just insert your new data.

YES: your idea seems correct to me but all depends on how much data you need to process each time; i don't think it is feasible with a large amount of data.

systempuntoout
I've added a bit more information to the question, but to get to the point, I don't really care if the data in the temporary table is new/updated/unchanged, I want to be able to determine if the data that a "bookmark" has pointed to has been updated or not.
Pharaun
In the end I decided to take this approach that you outlined under "NO", basically I would go look at my "bookmark" and do manual updates there, then for everything else just dump it straight into the table w/ a updated timestamp/etc.
Pharaun