I've searched through the site and haven't found a question/answer that quite answer my question, the closest one I found was: Syncing objects between two disparate systems best approach.
Anyway to begun, because there is no RSS feeds available, I'm screen scrapping a webpage, hence it does a fetch then it goes through the webpage to scrap out all of the information that I'm interested in and dumps that information into a sqlite database so that I can query the information at my leisure without doing repeat fetching from the website.
However I'm also storing various metadata on the data itself that is stored in the sqlite db, such as: have I looked at the data, is the data new/old, bookmark to a chunk of data (Think of it as a collection of unrelated data, and the bookmark is just a pointer to where I am in processing/reading of the said data).
So right now my current problem is trying to figure out how to update the local sqlite database with new data and/or changed data from the website in a manner that is effective and straightforward.
Here's my current idea:
- Download the page itself
- Create a temporary table for the parsed data to go into
- Do a comparison between the official and the temporary table and copy updates and/or new information to the official table
This process seems kind of complicated because I would have to figure out how to determine if the data in the temporary table is new, updated, or unchanged. So I am wondering if there isn't a better approach or if anyone has any suggestion on how to architecture/structure such system?
Edit 1: I'm not sure where to put the additional information, in an comment or as an edit, so I'm going to add it here.
This expands a bit on the metadata in regards of bookmarking, basically the data source can create new data/addition to the current data, so one reason why I was thinking of doing the temporary table idea was so that I would be able to determine if an data source that has been "bookmarked" has any new data or not.