I'm writing some code that calculates certain statistics about word usages.
Does anyone know where I can find a database of raw news articles from various topics over a period of (say) the last year? Preferably they would be either in plain text format or XML. Trying to scrape content from random web sites isn't a good option.
I know going forward I could probably archive them myself. However, I need to kick start the process with a bunch of existing articles... the more the merrier.
Any other ideas for corpus data-sets that are easily available in simple to parse form would also be appreciated.