views:

170

answers:

6

Like most apps, mine has a "users" table that describes the entities that can log in to it. It has info like their alias, their email address, their salted password hashes, and all the usual candidates.

However, as my app has grown, I've needed more and more special case "flags" that I've generally just stuck in the users table. Stuff like whether their most recent monthly email has been transmitted yet, whether they've dismissed the tutorial popup, how many times they clicked the "I am awesome" button, etc.

I am beginning to have quite a few of these fields, and the majority of these flags I don't need for the majority of the webpages that I handle.

Is there anything wrong with keeping all of these flags in the users table? Is there somewhere better to put them? Would creating other tables with a 1:1 relationship with the users table provide additional overhead to retrieving the data when I do need it?

Also, I use Hibernate as my ORM, and I worry that creating a bunch of extra tables for this information means that I'd also have to dirty up my User domain object. Advice?

+2  A: 

I'd say a better design was something like this:

create table users (
    id integer primary key,
    user varchar(32) unique
)

create table flags (
   id integer,
   flagname varchar(32),
   flagval char(1)
)

witha primary key of id + flagname. flags entries then look like:

1, 'administrator', 'Y',
1, 'editor', 'Y',
2, 'editor' 'Y'

and so on. I'd create a view to access the joined tables.

anon
I have seen this pattern go terribly wrong though. You forfeit a lot of information by having a field (flagname) describe the type of record stored in a table.
Daren Thomas
Yes, if it was me I would add a third table for flag names and make both key fields in flags be surrogates. But I was trying to keep things simple.
anon
+1  A: 

If you really only need this information on a few pages, why not have a table & relation for each flag? Existence of a record in that table sets the bit, selecting null is an unset bit.

The count of awesome clicks can then also be done by adding a record for each click (this solves the race-problem of updating the count in a record on the user table):

select count(*) from AwesomeClicks where userid = 1234

Use a unique constraint on the userid field for bit-only information (real flags as opposed to the count in the above example).

select userid from DismissedTutorialPopup where userid = 1234

This will result in either 1234 (flag is set) or null (flag is not set).

Also, by adding a CreateDate field, you can store when the flag was set etc.

Daren Thomas
+1  A: 

Some people don't seem to like this pattern for a number of reasons but I've developed a method for doing binary comparisons on base64 strings so I can handle a virtually unlimited number of flags inside a single varchar field. (6 per character technically)

I admit one frustration with this technique is that it's next to impossible to read them from inside the database. But it works for me. My flags defined in my application like so:

public class Flags
{
   public const string Flag1 = "1";
   public const string Flag2 = "2";
   public const string Flag3 = "4";
   public const string Flag4 = "8";
   public const string Flag5 = "g";
   public const string Flag6 = "w";
   public const string Flag7 = "10";
   // ... etc ...
}
Spencer Ruport
how to calculate aggregate function on this kind of flag? for example: select count(*) from table1 where flag4=1 and flag2=0
Anwar Chandra
A: 

**is there anything wrong with keeping all of these flags in the users table?**

Hi I am not sure which Db you are using currently but if you are using SQL server make sure the row size wont is 8060 bytes. (max row size 8060 ).

MAX row size

SQLserver 2005 - 8060 bytes MYSQL - 8052 bytes Oracle 9i - 255000 bytes.

Kthevar
FWIW, max row size for MySQL's MyISAM is 64KB. The 8052 limit is for InnoDB, and variable-sized columns (VARCHAR, TEXT, etc.) don't count toward this limit.
Bill Karwin
+3  A: 

There are several common solutions:

  • EAV

    Store one flag per row in a child table, with a reference to the user row, the name of the flag, and the value. Disadvantages: Can't guarantee a row exists for each flag. Need to define another lookup table for flag names. Reconstituting a User record with all its flags is a very costly query (requires a join per flag).

  • Bit field

    Store one flag per bit in a single long binary column. Use bitmasking in application code to interpret the flags. Disadvantages: Artificial limit on number of flags. Hard to drop a flag when it becomes obsolete. Harder to change flag values, search for specific flag values, or aggregate based on flag values without resorting to confusing bitwise operators.

  • Normalized design

    Store one BIT column per flag, all in the Users table. Most "correct" design from the perspective of relational theory and normalization. Disadvantages: Adding a flag requires ALTER TABLE ADD COLUMN. Also, you might exceed the number of columns or row size supported by your brand of RDBMS.

Bill Karwin
+1  A: 

It's interesting to see how the crappiest answer of all is the only one that got an upvote.

The question did not include sufficient information to actually give a sensible answer.

For one, it failed to say whether the question was about some logical design of a database, or some physical design of a database.

If the question was about logical design, then the answer is rather simple : NEVER include a boolean in your logical designs. The relational model already has a way of representing yes/no information (by virtue of the closed-world assumption) : namely as the presence of some tuple in some relation.

If the question was about physical design, then any sensible answer must necessarily depend on other information such as update frequency, querying frequency, volumes of data queried, etc. etc. None of those were provided, making the question unanswerable.

EDIT

"The relational model prescribes just one such type, BOOLEAN (the most fundamental type of all)." -- C. J. Date, SQL and Relational Theory (2009)."

That reply was of course bound to appear.

But does that quote really say that a type boolean should be available FOR INCLUSION IN SOME RELATION TYPE ? Or does that quote (or better, the larger piece of text that it appears in) only say that the existence of type boolean is inevitable because otherwise the system has no way of returning a result for any invocation of the equality operator, and that it is really the existence of the equality operator that is "prescribed" ?)

IOW, should type boolean be available for inclusion in relation types or should type boolean be available because otherwise there wouldn't be a single DML language we could define to operate on the database ?

Date is also on record saying (slightly paraphrased) that "if there are N ways of representing information, with N > 1, then there are also >1 sets of operators to learn, and >1 ways for the developer to make mistakes", and >1 sets of operators for the DBMS developer to implement, and >1 ways for the DBMS developer to make mistakes".

EDIT EDIT

"Date says "a relational attribute can be of any type whatsoever." He does not say an attribute can be of any type except boolean"

You have read Date very well.

One other thing that Date definitely does not say is that an attribute cannot be relation-typed. Quite the contrary. Yet, there is a broad consensus, and I know for a fact that even Date shares that consensus, that it is probably a bad idea to have base relvars that include an attribute that is relation-typed.

Likewise, nowhere does Date say that it is a GOOD idea to include boolean attributes in base relation types. He is absolutely silent on that particular issue. The opinion I expressed was mine. I don't think I gave the impression I was expressing somebody else's opinion in what I wrote originally.

Representing "the truth (or falseness) of any given proposition" can be done by including/omitting a tuple in the relation value of a certain relvar (at least logically !). Now, being able to include/exclude some given tuple from the value of some given relvar is most certainly fundamental. Given that, there is no need what so ever to be able to represent the truth (or falseness) of any given proposition (logically !) by using an attribute of type boolean. And what else would you use an attribute of type boolean for, but to say explicitly that some proprosition is either true or false ?

"The relational model prescribes just one such type, BOOLEAN (the most fundamental type of all)." -- C. J. Date, *SQL and Relational Theory* (2009).
Bill Karwin
"The set of system defined scalar type is required to include type BOOLEAN -- the most fundamental type of all -- but a real system will surely support others as well (INTEGER, CHAR, and so on)." Ibid.
Bill Karwin
Date says "a relational attribute can be of any type whatsoever." He does *not* say an attribute can be of any type except boolean.
Bill Karwin