I am looking for large web pages datasets for information search and text processing research.
The language of web pages must be English. It can be news websites backups, sites with any textual content no more than 1GB in size.
Do you know some good datasets?