ansaurus

Question

Answer 1

A:

Normalization reduces redundancy and it is what you'll do if you have large amount of data which seems to be your case. You need not denormalize. Let there be an M-to-M table between members and messages.

You can archive the old data as your M-to-M data increases. I don't even see any conflicts because your cron job runs daily for this task and accounts only for the data for the current day. So you can archive M-to-M table data every week.

I believe there will be maintenance issue if you denormalize by adding additional coloumns to members table. I don't recommend the same. Archiving of old data can save you from trouble.

this. __curious_geek 2009-12-03 11:03:40

With archiving old data, you mean deleting records for users that have received all 40 messages?

stef 2009-12-03 14:23:40

Shifting the data into another database which is not accessed so frequently so that your main application works well with M-2-M table having only required data and still retain the old data in archived table.

this. __curious_geek 2009-12-03 16:14:39

Answer 2

+1 A:

You could also append the id of the sent messages to a varchar-field in the members-table. Despite of good manners, this would make it easily possible to use one statement to get a message that has not been sent yet for a specific member.

Just like this (if you surround the ids with '-')

SELECT message.id
FROM member, message
WHERE member.id = 2321
AND member.sentmessages NOT LIKE '%-' && id && '-%'

Mulmoth 2009-12-03 11:04:23

Answer 3

+4 A:

I know that doesn't answer your original question, but wouldn't it be way faster if you selected all the messages that weren't yet sent to a user and then select one of those randomly?

See this pseudo-mysql here:

SELECT 
    CONCAT_WS(',', messages.ids) unsent_messages, 
    user.id user
FROM
    messages,
    user
WHERE
    messages.id NOT IN (
     SELECT 
      id 
     FROM 
      sent_messages 
     WHERE 
      user.id = sent_messages.user
    )
GROUP BY ids

André Hoffmann 2009-12-03 11:07:09

I agree, I've elaborated on this a bit below

Steve De Caux 2009-12-03 11:14:01

Could you clarify what you are doing in the first select statement with CONCAT_WS ?

stef 2009-12-04 09:04:25

Well I pretty much concatenate all the ids of messages that are unsent together using a comma separator. This allows to return a list of ids for every user in one SELECT.

André Hoffmann 2009-12-06 14:02:12

Answer 4

A:

You can achieve the effect of sending random messages by preallocating the random string in your m-m table and a pointer to the offset of the last message sent.

In more detail, create a table MemberMessages with columns
memberId,messageIdList char(80) or varchar ,
lastMessage int,
primary key is memberId.

Pseudo-code for the cron job then looks like this...

ONE. Select next message for a member. If no row exists in MemberMessages for this member, go to step TWO. The sql to select next message looks like

select substr(messageIdList, 2*lastMessage + 1, 2) as nextMessageId  
from MemberMessages  
where member_id = ?

send the message identified by nextMessageId

then update lastMessage incrementing by 1, unless you have reached 39 in which case reset it to zero.

update MemberMessages  
set lastMessage = MOD(lastMessage + 1, 40)  
where member_id = ?

TWO. Create a random list of messageIds as a String of couplets like 2117390740... This is your random list of message IDs as an 80 char String. Insert a row to MemberMessages for your member_id setting message_id_list to your 80 char String and set last_message to 1.

Send the message identified by the first couplet from the list to the member.

Steve De Caux 2009-12-03 11:10:08

Answer 5

A:

You could store only available (unsent) messages. This implies extra maintenance when you add or remove members or message types (nothing that can't be automated with foreign keys and triggers) but simplifies delivery: pick a random line from each user, send the message and remove the line. Also, your database will get smaller as messages get sent ;-)

Álvaro G. Vicario 2009-12-03 11:20:31

Answer 6

+1 A:

1.2 M rows @ 8 bytes (+ overhead) per row is not a lot. It's so small I wouldn't even bet it needs indexing (but of course you should do it).

erikkallen 2009-12-03 12:08:22

Answer 7

A:

You can create a kind of queue / heap.

ReceivedMessages

UserId
MessageId

then:

Pick up a member and select message to send:

SELECT * FROM Messages WHERE MessageId NOT IN (SELECT MessageId FROM ReceivedMessages WHERE UserId = @UserId) LIMIT 1

then insert MessageId and UserId to ReceivedMessages

and do send logic here

I hope that helps.

pablox 2009-12-03 12:17:27

Answer 8

A:

There are potential easier ways to do this, depending on how random you want "random" to be.

Consider that at the beginning of the day you shuffle an array A, [0..39] which describes the order of the messages to be sent to users today.

Also, consider that you have at most 40 Cron jobs, which are used to send messages to the users. Given the Nth cron job, and ID the selected user ID, numeric, you can choose M, the index of the message to send:

M = (A[N] + ID) % 40.

This way, a given ID would not receive the same message twice in the same day (because A[N] would be different), and two randomly selected users have a 1/40 chance of receiving the same message. If you want more "randomness" you can potentially use multiple arrays.

Vladski 2009-12-03 12:37:39

ansaurus

tags:

views:

answers:

Is this a case for denormalisation?

related questions