tags:

views:

136

answers:

2

Hi. Newbie here, so bear with me...

Would it be possible to find patterns of records?

I need to find repeating SEQUENCES of songs in the output of a radio programming software.

The station manager wants to know if certain songs 'next' to certain other songs repeat over time, ie if the DJs are getting lazy and repeating certain sequences.

  1. the table is ordered by a timestamp (datetime-aired)
  2. 'next' is defined as:

a) contiguous (song x has a rec-no of n, song y a rec-no of n+1) or

b) separated by one ad (song x has a rec-no of n, the ad has a rec-no of n+1, song y a rec-no of n+2)

Can't seem to find a way to get the info I need. Any ideas?

+1  A: 

If it's one song after another, assuming a table named tblSongs with a 'sequence' & 'name' column. You might want to try something like

select top N first.name, second.name, count(*)
from tblSongs as first 
     inner join tblSongs as second
         on second.sequence=first.sequence + 1
group by first.name, second.name
order by count(*) desc

If song sequence X,Y is counted the same as Y,X then

select top N first.name, second.name, count(*)
from tblSongs as first 
     inner join tblSongs as second
         on second.sequence=first.sequence + 1
         or second.sequence=first.sequence - 1
group by first.name, second.name
order by count(*) desc

If you are looking for any pattern of 2 song sequences, then

select first.name, second.name, abs(second.sequence - first.sequence) as spacing_count
from tblSongs as first 
     inner join tblSongs as second
         on second.sequence=first.sequence + 1
         or second.sequence=first.sequence - 1

Then do some statistical analysis on the spacing_count (which is beyond me).

I believe those will get you started.

John MacIntyre
Thanks, will try this ASAP!
staqUUR
Well, hole in one! Thank you John, I believe you gave me the basis of what I want to do. Now for the laborious manual cross-check (sigh)
staqUUR
A: 

I am not sure if you can do a direct database query to find that, but if you can dump the ResultSet to a file sorted by timestamp, it should be easy to figure it out.

Think of it this way. Assume that a song is a character and the timestamp sorted list of songs is a String. E.g. you have a songlist represented by the following where A B and C are unique songs:

ABCACBABC                 (Ignore ads for now)

Now you can break this into subsequences of adjacent two characters (the are called bigrams).

The bigrams you get are:

AB, BC, CA, AC, CB, BA, BC

Now you can clearly see that the bigram BC is repeated. To do this programmatically, you can throw every bigram into a HashMap (or Hashtable/HashSet) and query every new bigram against the HashMap to see if it contains it. If the map already contains it, then it is a repeat. If not, then it is a new one, so put it into the map.

At the end of this exercise, you will know which combinations are repeated.

Now coming to the case where ads are present, lets call all ad as X. Consider the sequence

ABCXABCXXABCABC

Wherever you have more than one ad successively, (replace that with a hard line break (you are dividing the string into multiple strings). You will get

ABCXABC

<XX - is replaced by a new line>

ABCAB

Now replace all single ads by empty strings. You will get

ABCABC
ABCAB

Now process each String independently to identify repetitions.

Hope it works.

hashable
Thank you for your prompt answer. The queries by Mr. MacIntyre fulfil my needs for now. BUT... I foresee even weirder queries in the future, and they will probably not solve as easily as this one. So I'll keep your idea in mind,as it is very flexible. Thanks to you both for helping out. BTW, I'm using SQLite: 'top N' -> 'limit N' at the end of the query.
staqUUR