Say I have two tables, user
and comment
. They have table definitions that look like this:
CREATE TABLE `user` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`username` VARCHAR(255) NOT NULL,
`deleted` TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
UNIQUE KEY (`username`)
) ENGINE=InnoDB;
CREATE TABLE `comment` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`user_id` INTEGER NOT NULL,
`comment` TEXT,
`deleted` TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
CONSTRAINT `fk_comment_user_id` FOREIGN KEY (`user_id`)
REFERENCES `user` (`id`)
ON DELETE CASCADE
ON UPDATE CASCADE
) ENGINE=InnoDB;
This is great for enforcing data integrity and all that, but I want to be able to "delete" a user and keep all its comments (for reference's sake).
To this end, I've added deleted
so that I can SET deleted = 1
on a record. By listing everything with deleted = 0
by default, I can hide away all the deleted records until I need them.
So far so good.
The problem comes when:
- A user signs up with a username (say, "Sam"),
- I soft-delete that user (for unrelated reasons), and
- Someone else comes along to sign up as Sam, and suddenly we've violated the UNIQUE constraint on
user
.
I want users to be able to edit their own usernames, so I shouldn't make username
the primary key, and we'll still have the same problem when deleting users.
Any thoughts?
Edit for clarification: Added following RedFilter's answer and comments below.
I'm concerned with the case where the "deleted" users and comments are not visible to the public, but are visible only administrators, or are kept for the purpose of calculating statistics.
This question is a thought experiment, with the user and comment tables just being examples. Still, username
wasn't the best one to use; RedFilter makes valid points about user identity, particularly when the records are presented in a public context.
Regarding "Why isn't username the primary key?": this is just an example, but if I apply this to a real problem I'll be needing to work within the constraints of an existing system that assumes the existence of a surrogate primary key.