ansaurus

Question

Is it a good idea to incorporate a Reference Count into every database table?

Answer 1

+4 A:

You can do just the following:

DELETE
FROM    photos
WHERE   id NOT IN
        (
        SELECT  photo_id
        FROM    photos_users_like
        )
        AND id NOT IN (
        SELECT  photo_id
        FROM    photos_users_made
        )
        AND id NOT IN (
        SELECT  photo_id
        FROM    photos_users_recommended
        )

If you index your photo_ids in all tables, the NOT INs will be optimized by MySQL so that the predicates will return FALSE when the engine finds but a single matching record in the corresponding tables and there will be no need in reference counts.

Quassnoi 2009-12-30 15:14:31

Looks promising! Is that standard SQL, or specific to a particular database?

openfrog 2009-12-30 15:18:02

Yes, this is standard `SQL`, optimized fairly well by all major engines. See here: http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/ for `MySQL` and browse the previous articles for other engines.

Quassnoi 2009-12-30 15:20:03

Answer 2

+3 A:

This problem is often solved using either

An ORM layer that supports orphan-deletion, or
A database trigger that deletes orphans

Using a reference count could be expensive, depending how often the data is modified, and will likely make your database code obscure and hard to follow.

skaffman 2009-12-30 15:14:52

Answer 3

A:

I think this is what cascading deletes are for. I would not be in favor of doing this. Reference counts are a fine idea for non-garbage collected languages, but in this case I think that SQL has had it covered for a long time.

This is why it's good to have a database administrator on hand. Noobs tend to get themselves into trouble.

duffymo 2009-12-30 15:15:20

Answer 4

A:

To find photos which no one owns, you would do something like:

select P.PhotoId
  from Photos as P
  where not exist (select ownerId from PhotoOwners as PO where PO.PhotoId = P.PhotoId)

And this can be extended to an arbitrary number of non-existence checks.

To prevent a referenced photos being deleted, you can make use of Foreign Keys.

Summary: RDBMSs are designed to solve these types of problems without reference counting and other (error prone) mechanisms being built by each application.

Richard 2009-12-30 15:16:08

Answer 5

A:

Bored users are unlikely to take the trouble to reverse their previous upvote. Why not count a more directly useful metric, like pageviews in the last month?

Richard Inglis 2009-12-30 15:18:28

Answer 6

+1 A:

You don't need to keep reference counts. I advise agaist it. Going by your description, you ucould use the following db structure:

create table users ( 
    id    int             not null auto_increment
,   name  varchar(64)     not null 
   ...more columns...
,   primary key (id)
)

create table photos ( 
    id             int          not null auto_increment
,   url            varchar(255) not null 
,   user_id_owner  int
,   primary key (id)
,   foreign key (user_id_owner) references users(id)
)

create table user_likes_photo (
    user_id  int not null
,   photo_id int not null
,   primary key(user_id, photo_id)
,   foreign key (user_id)  references users(id)
,   foreign key (photo_id) references photos(id)
)

create table user_recommends_photo (
    user_id_recommending  int not null
,   photo_id              int not null
,   user_id_recommended   int not null
,   primary key(user_id_recommending, photo_id, user_id_recommended)
,   foreign key (user_id_recommending)  references users(id)
,   foreign key (user_id_recommended)   references users(id)
,   foreign key (photo_id) references photos(id)
)

This way, you keep track of all relationships.

To remove unreferenced photo's you'd do:

delete from photos
where user_id_owner is null 
and id not in (
    select photo_id
    from   user_likes_photo
)
and id not in (
    select photo_id
    from   user_recommends_photo
)

Roland Bouman 2009-12-30 15:22:21

+1. Foreign keys do what the OP is looking for. They can be configured to either force deletion to fail if it would violate foreign key constraints, or the deletion can be caused to cascade to dependent tables as well (e.g., you could have it automatically delete your likes and recommends rows)

Frank Farmer 2009-12-30 16:06:16

Answer 7

+1 A:

reference-counting should not be necessary; most databases will track relational-integrity for you

for example, in ms-sql if you executed

delete from photos where photoid = 1234

but that photo was referenced somewhere, the database will raise an error and refuse to delete the photo; in .NET code this manifests as a Sql exception

if you need to know in advance if a photo can be deleted, the EXISTS queries above will work; if you don't need to know in advance, then just try to delete it and let the database tell you that you cannot

reference-counting is a lot of extra work, and probably unnecessary; take advantage of the capabilities of the database in this regard

Steven A. Lowe 2009-12-30 16:01:23

Answer 8

A:

"take advantage of the capabilities of the database in this regard"

Amen to that.

And also amen to its logical conslusion that "if another DBMS has MORE capabilities in this regard, then switch to that DBMS immediately".

Erwin Smout 2010-01-01 21:03:03

ansaurus

tags:

views:

answers:

Is it a good idea to incorporate a Reference Count into every database table?

related questions