Amazon Web Services has public data sets available for free for users of their EC2 cloud services. I do not know anything more than that.
They have fantastic sounding stuff. Even if you do not want to use EC2 you can still use the list as a guide to sites and organizations that make the data available.
(taken directly from their page)
BIOLOGY
Annotated Human Genome Data provided by ENSEMBL
An annotated form of the Human Genome, perfect for biological research, which was released as of December 10, 2008. The first snapshot, called the main Ensembl data, includes human and approximately 40 other species (see www.ensembl.org for a list) as well as comparative genomics data (approximately 550GB). The second snapshot, called the Ensembl Biomart, is a denormalized, query-optimized database that facilitates complex queries of one or more datasets (approximately 172GB).
Main Ensembl (Linux/UNIX): snap-c78360ae
Ensembl BioMart (Linux/UNIX): snap-c48360ad
GenBank provided by the National Center for Biotechnology Information
An annotated collection of all publicly available DNA sequences including more than 85.7B bases and 82.8M sequence records (approximately 250GB)
Linux/UNIX: snap-b04ba2d9 (updated 02/15/2009)
UniGene provided by the National Center for Biotechnology Information
A set of transcript sequences of well-characterized genes and hundreds of thousands of expressed sequence tags (EST), last updated as of December 9, 2008. (approximately 10 GB)
Linux/UNIX: snap-5ad83b33
Windows: snap-60d83b09
CHEMISTRY
A 3D Version of the PubChem Library provided by Rajarshi Guha at Indiana University
A 3D (single conformer) version of Pubchem, a public database of chemical structures in SD Format (approximately 70 GB)
Linux/UNIX: snap-a8dd3dc1
Windows: snap-40dd3d29
UGI Virtual Conformer Library provided by Rajarshi Guha at Indiana University
80GB of data in SD format on conformers for 500,000 molecules that can be used for virtual screening (approximately 85 GB)
Linux/UNIX: snap-59d33330
Windows: snap-48ce2r21
PubChem Library provided by by the National Center for Biotechnology Information
A data set of information on the biological activities of small molecules (approximately 230 GB)
Linux/UNIX: snap-e6df3c8f
Windows: snap-63d83b0a
ECONOMICS
Various US Census Databases provided by The US Census Bureau
United States demographic data from the 1980 (approximately 2 GB), 1990 (approximately 50 GB), and 2000 US Censuses (approximately 200GB), summary information about Business and Industry (approximately 15 GB), and 2003-2006 Economic Household Profile Data (approximately 220 GB)
2000 US Census (Linux/UNIX): snap-92d333fb
2000 US Census (Windows): snap-36ce2e5f
1990 US Census (Linux/UNIX): snap-33f8185a
1990 US Census (Windows): snap-8cf818e5
1980 US Census (Linux/UNIX): snap-9df717f4
1980 US Census (Windows): snap-b6f818df
2003-2006 Economic Data (Linux/UNIX): snap-0bdf3f62
2003-2006 Economic Data (Windows): snap-4edd3d27
Business and Industry Summary Data (Linux/UNIX): snap-5cf81835
Business and Industry Summary Data (Windows): snap-8af818e3
Various Labor Statistics Databases provided by The Bureau of Labor Statistics
Statistics on Inflation & Prices, Employment, Unemployment, Pay & Benefits, Spending & Time Use, Productivity, Workplace Injuries, International Comparisons, Employment Projections, and Regional Resources (approximately 15 GB)
Linux/UNIX: snap-30f81859
Windows: snap-8df818e4
Various Transportation Databases provided by The Bureau of Transportation Services
Data and statistics from the US Department of Transportation on Aviation, Maritime, Highway, Transit, Rail, Pipeline, Bike/Pedestrian and other modes of transportation (approximately 15 GB)
Linux/UNIX: snap-e1608d88
Windows: snap-37668b5e
ENCYCLOPEDIC
DBpedia Knowledge Base provided by DBpedia.
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. The DBpedia knowledge base currently describes more than 2.6 million things, including at least 213,000 persons, 328,000 places, 57,000 music albums, 36,000 films, 20,000 companies. The knowledge base consists of 274 million pieces of information (RDF triples). It features labels and short abstracts for these things in 30 different languages; 609,000 links to images and 3,150,000 links to external web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories (approximately 67GB).
Semantic extraction by DBpedia with contributions from the DBpedia Community, using data from Wikipedia.org. Snapshots prepared by the infochimps.org team using community curated metadata. Released under the GNU Free Documentation License.
Linux/UNIX: snap-37b75e5e
Windows: snap-09b75e60
Freebase Data Dump provided by Freebase.com.
A data dump of all the current facts and assertions in the Freebase system. Freebase is an open database of the world’s information, covering millions of topics in hundreds of categories. Drawing from large open data sets like Wikipedia, MusicBrainz, and the SEC archives, it contains structured information on many popular topics, including movies, music, people and locations – all reconciled and freely available. This information is supplemented by the efforts of a passionate global community of users who are working together to add structured information on everything from philosophy to European railway stations to the chemical properties of common food ingredients. For more answers check the Freebase FAQ(approximately 26GB).
Data aggregated, processed and reconciled by freebase.com using data from Wikipedia.org, the freebase community, and many other open data sets. Snapshots prepared by the infochimps.org team using community curated metadata. Released under Creative Commons Attribution (CC-BY) license and the Freebase Terms of Service and Licensing Policy.
Linux/UNIX: snap-a8957cc1
Windows: snap-ab957cc2
Wikipedia Extraction (WEX) provided by Freebase.com.
The Freebase Wikipedia Extraction (WEX) is a processed dump of the English language Wikipedia. The wiki markup for each article is transformed into machine-readable XML, and common relational features such as templates, infoboxes, categories, article sections, and redirects are extracted intabular form. Freebase WEX is provided as a set of database tables in TSV format for PostgreSQL, along with tables providing mappings between Wikipedia articles and Freebase topics, and corresponding Freebase Types. (approximately 66GB)
Semantic extraction by freebase.com, using data from Wikipedia.org. Snapshots prepared by the infochimps.org team using community curated metadata. Released under the GNU Free Documentation License.
Linux/UNIX: snap-a0957cc9
Windows: snap-a6957ccf