I need to build a content gathering program that will simply read numbers on specified web pages, and save that data for analysis later. I don't need it to search for links or related data, just gather all data from websites that will have changing content daily.
I have very little programming experience, and I am hoping this will be good for learning. Speed is not a huge issue, I estimate that the crawler would at most have to load 4000 pages in a day.
Thanks.
Edit: Is there any way to test ahead of time if the websites from which I am gathering data are protected against crawlers?