A: 

Your first instinct to setup associations with a song having many annotations would definitely work. Two potential approaches to storing the start and stop annotation indexes:

  1. Store the start and end line that the lyric occurred on (count the linebreaks in your lyric file)

or

  1. Store the start and end word boundary (or just space) that denotes the annotation. This would at least let you correct most typos without breaking the annotation index.
Mike Buckbee
+2  A: 
  1. Tokenize your lyrics, so that you can identify a word in the lyrics by using e.g. a line and word number. Another option is to use character positions for your annotations. In any case, as always, take care of the character encoding of the lyrics.
  2. Further, never touch the lyrics anymore. Better not store them as html, but as xml or as plain text.
  3. Don't annotate within lyrics. Use a model wherein you can attach a position in the lyrics to an annotation. Use stand-off annotation.

Stand-off annotation will allow you to add more features over time, such as letting many users annotate the same lyrics. Generating the HTML you store as a blob is easy to do from stand-off annotations.

You might be interested in the (xml) data models of annotation tools that are quite well known among linguists: e.g. MMAX2 and Callisto. These are easily convertible to database models.

lbp
A: 

As for linking annotations and lyrics you can have several approaches:

  1. Link as proposed above annotations to exact places in lyrics (eg. line numbers, words, characters).

  2. Make dictionary phrases/words <-> annotation. Just before displaying you search dictionary and insert into page annotations. If speed or specificity is concern each entry in dictionary can be tagged by relevant songs. If you want your annotations to be robust to small changes in lyrics than while finding matches in lyrics for annotated phrase use Longest common subsequence metric.

  3. Combine #1 and #2

Alfa07
+6  A: 

What about presenting the lyrics like this (with thanks to the People's Champ)?

Well it's that [grain grippa][1] from Houston, Tex
That bar sippa, that bar no plex
I'm straight up outta that [Swishahouse][2]
Where G. Dash write all the checks
So [check the neck, check the wrist][3]
I'm balla status from head to toe

[1]Referring to the wood grain steering wheel common to luxury cars
[2]Swisha House is the record label Paul Wall records for
[3]"Look at my watch and necklace because they are expensive"

Just an idea, I was inspired by the markup used to add comments on this site.

So, for the database, create Lyric, LyricLine and Annotation tables. Annotations have LyricLineIds, StartChar and EndChar values and a Meaning or Description field. LyricLines are the text of each line, related to the Lyric entity by LyricIds. Lyrics store song info, language info, whatever.

This format should be pretty easy to generate off of the database and has the benefit of being more "human readable" than XML and editable in-place, so you can test it a lot easier before you have to develop a whole UI.

I have this question favorited, and look forward to watching the site progress. Interesting work!

Chris McCall