I'm scraping a website (scripting responsibly by throttling my scraping and with permission) and I'm going to be gathering statistics on 300,000 users.
I plan on storing this data in a SQL Database, and I plan on scraping this data once a week. My question is, how often should I be doing inserts on the database as results come in from the scraper?
Is it best practice to wait till all results are in (keeping them all in memory), and insert them all when the scraping is finished? Or is it better to do an insert on every single result that comes in (coming in at a decent rate)? Or something in between?
If someone could point me in the right direction on how often/when I should be doing this I would appreciate it.
Also, would the answer change if I was storing these results in a flat file vs a database?
Thank you for your time!