views:

542

answers:

3

I have a database with one table, like so:

UserID (int), MovieID (int), Rating (real)

The userIDs and movieIDs are large numbers, but my database only has a sample of the many possible values (4000 unique users, and 3000 unique movies)

I am going to do a matrix SVD (singular value decomposition) on it, so I want to return this database as an ordered array. Basically, I want to return each user in order, and for each user, return each movie in order, and then return the rating for that user, movie pair, or null if that user did not rate that particular movie. example:

USERID | MOVIEID | RATING
-------------------------
99835   8847874    4
99835   8994385    3
99835   9001934    null
99835   3235524    2
           .
           .
           .
109834  8847874    null
109834  8994385    1
109834  9001934    null

etc

This way, I can simply read these results into a two dimensional array, suitable for my SVD algorithm. (Any other suggestions for getting a database of info into a simple two dimensional array of floats would be appreciated)

It is important that this be returned in order so that when I get my two dimensional array back, I will be able to re-map the values to the respective users and movies to do my analysis.

+6  A: 
SELECT m.UserID, m.MovieID, r.Rating
    FROM (SELECT a.userid, b.movieid
              FROM (SELECT DISTINCT UserID FROM Ratings) AS a,
                   (SELECT DISTINCT MovieID FROM Ratings) AS b
         ) AS m LEFT OUTER JOIN Ratings AS r
         ON (m.MovieID = r.MovieID AND m.UserID = r.UserID)
    ORDER BY m.UserID, m.MovieID;

Now tested and it seems to work!

The concept is to create the cartesian product of the list of UserID values in the Ratings table with the list of MovieID values in the Ratings table (ouch!), and then do an outer join of that complete matrix with the Ratings table (again) to collect the ratings values.

This is NOT efficient.

It might be effective.

You might do better though to just run the plain simple select of the data, and arrange to populate the arrays as the data arrives. If you have many thousands of users and movies, you are going to be returning many millions of rows, but most of them are going to have nulls. You should treat the incoming data as a description of a sparse matrix, and first set the matrix in the program to all zeroes (or other default value), and then read the stream from the database and set just the rows that were actually present.

That query is the basically trivial:

SELECT UserID, MovieID, Rating
    FROM Ratings
    ORDER BY UserID, MovieID;
Jonathan Leffler
Thanks, tvanfosson, for the edit in the ON clause.
Jonathan Leffler
A: 

If I understand your question correctly, you have all the data in your table, and you just want to extract it in the right order. Is that correct? If so, it should just be a mattter of:

select userid, movieid, rating
from ratings
order by userid, movieid
John Fouhy
That won't work if what the user wants is the same size 'array' for each user - he stated that not all userid/movieid pairs exist in the database. Your answer gives variable number of records per user.
paxdiablo
+1  A: 

Sometimes the best thing to do is refactor the table/normalize your data (if that is an option).

Normalize the data structure:

Users Table: (all distinct users)
UserId, FirstName, LastName

Movies Table: (all distinct movies)
MovieId, Name

UserMovieRatings: (ratings that users have given to movies)
UserId, MovieId, Rating

You can do a Cartesian join if you want every combination of users and movies and then use the UserMovieRatings table as needed.

It's probably best to do the refactoring now before you system gets any more complicated. Take this time upfront and I'm positive any queries you need to make will come naturally...hope that helps...

Sample Query:

select UserId, FirstName, LastName, MoveId, Name, cast(null as int) as Rating
into #FinalResults
from Users
cross join Movies

--may not be entirely correct....but you get the idea
update #FinalResults
set Rating = UMR.Rating
from #FinalResults FR
inner join UserMovieRatings UMR
on FR.UserId = UMR.UserId and FR.MovieId = UMR.MovieId

Amir