views:

839

answers:

3

When it comes to classifying music by genre, I've found wikipedia to have more interesting genre information than most other data sources.

I seem to remember a database that collected this sort of information from Wikipedia and made it more easily accessible, but I couldn't google anything up today.

If I was to attempt to retrieve this data, what are my options? Is there anything like what I described or do I need to go a-screen-scraping?

+9  A: 

You should look into Freebase (see, for example, their musical artists table). If you do choose Wikipedia, then you should probably download a database dump.

Example comparing genre listings of Freebase and Wikipedia for the band Radiohead:

  • Freebase: alternative rock, art rock, electronic music, progressive rock, electronica, and experimental rock.
  • Wikipedia: alternative rock, electronic, and experimental rock.

Edit: More importantly, I've included a working example using mjt, a Javascript framework designed for Freebase. Copy-paste this into a file, open with your browser, enter an artist's name, and see which genres Freebase has for them.

Less importantly, I've changed my examples and default to Radiohead. =)

<html>
<head>
  <script type="text/javascript" src="http://mjtemplate.org/dist/mjt-0.6/mjt.js"&gt;&lt;/script&gt;
</head>
<body onload="mjt.run()">
<pre mjt.script="">
var name = mjt.urlquery.name ? mjt.urlquery.name : 'Radiohead';
</pre>
<div mjt.task="q">
mjt.freebase.MqlRead([{
  type: '/music/artist',
  name: {
    value:name,
    lang:{name:{value:'English'}}
  },
  genre: [{
    name: {
      value:null,
      lang:{name:{value:'English'}}}
  }]
}])
</div>

<form method="get" action="">
<input type="text" name="name" value="$name" />
<input type="submit" value="search" />
</form>

<table mjt.for="topic in q.result">
  <tr mjt.for="(var rowi = 0; rowi &lt; topic.genre.length; rowi++)">
    <td><pre mjt.script="">var gname = topic.genre[rowi].name;</pre>$gname.value</td>
  </tr>
</table>
</body></html>

You're most likely using another language, but hopefully you can easily translate the above query.

A. Rex
+1 for freebase, probably the best source there is, aside from the label/artist themselves.
Unkwntech
Freebase does look pretty good. I'll investigate further. Thanks!
Kenny
There are supposedly good APIs for interacting with Freebase, available in most languages? Unfortunately, I've only ever interacted through mjt, a Javascript framework: http://stackoverflow.com/questions/33484/can-i-export-translations-of-place-names-from-freebase-com
A. Rex
Very nice, I needed to gather info on sports from Wikipedia. This looks much better.
DavGarcia
+5  A: 

MusicBrainz (http://musicbrainz.org/) may be what you want, instead of wikipedia. It is a project to make a freely-licensed, high-quality collection of music metadata (name of composer, title of album, title of track, name of the trombonist on that track, etc.). They have developed an awesome database, a detailed database schema, comprehensive style guidelines for making metadata accurate and consistent, application software that can insert metadata into tags in music data files, and an API by which you can use the data. All freely available and collaboratively edited.

The one weak area of MusicBrainz's metadata is musical genre. This is because its such an intractable problem: one person's "funk" is another person's "pop".

+2  A: 

I found what I was thinking of when I posted my question. Infochimps keeps collections of infoboxes from Wikipedia, such as this one for musical artists. It's not really what I want though because it's only available as a download.

While I was looking I found how to access articles in XML format with unrendered wiki markup. Apparently it's easier on the wikipedia servers but I'm unsure about whether it would be easier to parse.

Kenny