views:

452

answers:

11

I (and co-hackers) are building a sort of trivia game inspired by this blog post: http://messymatters.com/calibration. The idea is to give confidence intervals and learn how to be calibrated (when you're "90% sure" you should be right 90% of the time).

We're thus looking for, ideally, thousands of questions with unambiguous numerical answers. Also, they shouldn't be too boring. There are a lot of random statistics out there -- eg, enclosed water area in different countries -- that would make the game mind-numbing. Things like release dates of classic movies are more interesting (to most people).

Other interesting ones we've found include Olympic records, median incomes for different professions, dates of famous inventions, and celebrity ages. Scraping things like above, by the way, was my reason for asking this question: http://stackoverflow.com/questions/2611418/scrape-html-tables

So, if you know of other sources of interesting numerical facts (in a parsable form) I'm eager for pointers to them. Thanks!

+3  A: 

All the stats U'll ever need...


There are several "open" databases available online.

http://unstats.un.org/unsd/databases.htm

Just pull your data from them, and you are up!!

NOTE: You might want to cache each Question once you pull it, for future re-use (different user).

GoodLUCK!!

CVS @ 2600Hertz

CVS-2600Hertz
Great stuff; thanks so much! If there's are particular stats you think would be especially interesting, let me know. I've got some things like infant mortality, unemployment rate, and number of bordering countries so far.
dreeves
I would upvote this if it wasn't written SMS style.
Pascal Thivent
+1  A: 

Wikipedia has a number of number that show up repeatedly (often in a side bar) for instance, many if not most TV show pages have a link to a list of episodes and the link has a episode count.

BCS
Got that one; thank you!
dreeves
+1  A: 

The questions in this game are perfect for what we have in mind:

http://en.wikipedia.org/wiki/Wits_and_Wagers

I wonder how the creators of Wits & Wagers collected those questions...

dreeves
+2  A: 

Box Office Mojo is a great one for how much famous movies have grossed. I think people find that interesting.

Dan Tao
+2  A: 

You can try knocking at the front door:

Pioneer Grants: Pioneer Grants are available for startups and other developers building innovative applications with the Wolfram|Alpha API.

(http://products.wolframalpha.com/api/pricing.html)

belisarius
Curiously enough, I found that a fellow stackoverflower already won an Alpha Grant. // "Wrote a Google Wave robot that scrapes and retrieves Wolfram Alpha queries into active wave. Received Pioneer Grant for efforts. Currently fixing glitching and listening to feedback!" // Mark Fayngersh
belisarius
+5  A: 

Video game category

vgchartz.com have various charts for video game titles and hardware performance.

Sample queries:

There's enough data for questions like:

  • How many hardware/title X were sold in Year Y/first week of sales?
  • Title X outsells Title Y (in their respective first N weeks of sales) by how much/what ratio?

Popular music category

billboard.com is all you need.

Wikipedia links

In addition to sales figures, you can also ask queries about chart positions, e.g.:

  • In Category Y of Chart Z, where does song X place/how many songs does artist X have?

Making the most out of your data

You can make unambiguous numeric Q/A out of most lists. Take for example, a list like TIME.com All Time 100 Novels

Some generic questions that can be asked are:

  • How many are written in a given time period?
    • Decade, year, in the presidency of George Bush, before 9/11, etc.
  • What's the gap in rank between Title X and Title Y?
    • Pairwise queries like this really make the most of your data!

You can do this with any given Top 100 lists:


History category

historyorb.com is just one example. The URLs and HTMLs are very scrape-friendly.

There are many similar sites, e.g. brainyhistory.com.

You can also use these dates to "cross" with the other data (e.g. the Top 100 Novels example above).


Movie category

The Internet Movie Database is of course... the internet movie database!

polygenelubricants
Wow, these are some great ideas! Thanks!
dreeves
+2  A: 

Well, if you'd like to make questions like "what's population of country X?", "how high is the highest mountain in Europe?" then this could be your choice:

http://www.dbis.informatik.uni-goettingen.de/Mondial/

The MONDIAL database has been compiled from geographical Web data sources listed below:

  • CIA World Factbook,
  • a predecessor of Global Statistics which has been collected by Johan van der Heijden.
  • additional textual sources for coordinates,
  • the International Atlas by Kümmerly & Frey, Rand McNally, and Westermann,
  • and some geographical data of the Karlsruhe TERRA database.
vartec
+2  A: 

Sports trivia would lend itself pretty well to this, as you can come up with a ton of questions that 1) have unambiguous numerical answers and 2) some people actually care about. I know a downloadable database for baseball statistics is out there, and I'd be surprised if you couldn't find similar databases other major (and not-so-major) sports as well. You'll still have to pick and choose, as there's such a thing as too much minutia even for die-hard sports fans ("How many strikeouts did [obscure pitcher] compile in 1923?"), but it should be a rich environment to mine.

BlairHippo
+1  A: 

World Facts (Crime, Economy, Food etc...)

http://www.nationmaster.com/facts.php

Did you know? (Facts | Fast Facts | Animals | History | Lists | News | Phobias)

http://didyouknow.org/

JeremySpouken
+1  A: 

Cricket statistics. Popular with millions of people around the world, and all accessible from the incredible database at http://www.cricinfo.com. Highly recommend.

Also the CIA factbook: https://www.cia.gov/library/publications/the-world-factbook/

has all sorts of useful numerical facts about countries and the like.

Mark Mayo
+1  A: 

WolframAlpha might be a good place to look for numerical data in all sorts of categories.

Steve Haigh