views:

1000

answers:

8

Hi all,

MY PLATFORM:

PHP & mySQL

MY SITUATION:

I came across a situation where I need to store a value for user selection in one of my columns of a table. Now my options would be to:

  1. Either declare the Column as char(1) and store the value as 'y' or 'n'
  2. Or declare the Column as inyint(1) and store the value as 1 or 0
  3. This column so declared, may also be indexed for use within the application.

MY QUESTIONS:

So I wanted to know, which of the above two types:

  1. Leads to faster query speed when that column is accessed (for the sake of simplicity, let's leave out mixing other queries or accessing other columns, please).

  2. Is the most efficient way of storing and accessing data and why?

  3. How does the access speed vary if the columns are indexed and when they are not?

My understanding is that since char(1) and tinyint(1) take up only 1 byte space, storage space will not be an issue in this case. Then what would remain is the access speed. As far as I know, numeric indexing is faster and more efficient than anything else. But the case here is tough one to decide, I think. Would definitely like to hear your experience on this one.

Thank you in advance.

+6  A: 

Using tinyint is more standard practice, and will allow you to more easily check the value of the field.

// Using tinyint 0 and 1, you can do this:
if($row['admin']) {
    // user is admin
}

// Using char y and n, you will have to do this:
if($row['admin'] == 'y') {
    // user is admin
}

I'm not an expert in the inner workings of MySQL, but it intuitively feels that retrieving and sorting integer fields is faster than character fields (I just get a feeling that 'a' > 'z' is more work that 0 > 1), and seems to feel much more familiar from a computing perspective in which 0s and 1s are the standard on/off flags. So the storage for integers seems to be better, it feels nicer, and is easier to use in code logic. 0/1 is the clear winner for me.

You may also note that, to an extent, this is MySQL's official position, as well, from their documentation:

BOOL, BOOLEAN: These types are synonyms for TINYINT(1). A value of zero is considered false. Nonzero values are considered true.

If MySQL goes so far as to equate TINYINT(1) with BOOLEAN, it seems like the way to go.

Matchu
+1 A nice way to perform the check when using PHP.
Devner
Perhaps it's a good thing to have that sort of check? The IDE, let me explain....require_once("./Permissions.php");...if( $row['permissions'] === Permissions::ADMIN ) { // user is admin}not only is this good for readability of code, using a static property to reference a value gives a good compile time check against typos, and when using a predictive IDE, it will help you code quickly. This example gives you multi-level permisions but I think readability and maintainability is key to developing large scale projects so I'm all for that.
Gary Paluk
@Gary Thanks for your comment, but I am unable to tell if you are advocating the use of 0 and 1 or the non-usage of it. I just feel that your programming practice is different from mine, so please bear with me as I might take a little more time to understand what you are implying.
Devner
+10  A: 

I think you should create column with ENUM('n','y'). Mysql stores this type in optimal way. It also will help you to store only allowed values in the field.

You can also make it more human friendly ENUM('no','yes') without affect to performance. Because strings 'no' and 'yes' are stored only once per ENUM definition. Mysql stores only index of the value per row.

Also note about sorting by ENUM column:

ENUM values are sorted according to the order in which the enumeration members were listed in the column specification. (In other words, ENUM values are sorted according to their index numbers.) For example, 'a' sorts before 'b' for ENUM('a', 'b'), but 'b' sorts before 'a' for ENUM('b', 'a').

Ivan Nevostruev
Way back when, I had the same question as the OP, and I benchmarked it to find enum the quickest and most efficient of the three options. Just make sure you don't use enum('0', '1') like I did -- you'll end up wondering why UPDATE X SET Y = 0; doesn't work (you need single quotes).
Langdon
+1 for Langdon. That's a very unique point you specified. I never knew about it until now. So that means if we use enum('0', '1'), our query must have UPDATE X SET Y = '0'; Is that correct?@Ivan If I am right, ENUM('n','y') takes the same space as ENUM('no','yes'). Am I right?
Devner
@Devner Yes, space usage is the same because you can't add any values other then '', 'no' and 'yes'. Mysql stores only index of the value per row, not the string. Strings 'no' and 'yes' are stored only once in table definition.
Ivan Nevostruev
@Devner: All enum values have numerical indexes, beginning with 1 (0 is a special value to indicate the empty string). You can use these indexes to query and set values, but as the manual says: "For these reasons, it is not advisable to define an ENUM column with enumeration values that look like numbers, because this can easily become confusing." [ http://dev.mysql.com/doc/refman/5.1/en/enum.html ] (Do not confuse these numerical indexes with real column indexes, there is just no better word to differentiate between them)
Jan Fabry
+4  A: 

To know it for sure, you should benchmark it. Or know that it probably will not matter that much in the grander view of the whole project.

Char columns have encodings and collations, and comparing them could involve unnecessary switches between encodings, so my guess is that an int will be faster. For the same reason, I think that updating an index on an int column is also faster. But again, it won't matter much.

CHAR can take up more than one byte, depending on the character set and table options you choose. Some characters can take three bytes to encode, so MySQL sometimes reserves that space, even if you only use y and n.

Jan Fabry
+1 for "But again, it won't matter much." I'm thinking the same thing. The difference is likely negligible.
Justin Johnson
@Jan What you say, makes sense to me. So say if I use enum('n', 'y'), does the switches between encodings and comparisons lag still apply? How would it differ when using INNODB VS MyISAM?
Devner
@Devner: Yes, since enum columns are defined with an encoding and a collation, I assume this can have a performance impact. I don't know about differences between InnoDB and MyISAM, just a note that describes and InnoDB option that can affect char storage [ http://dev.mysql.com/doc/refman/5.1/en/data-size.html ]
Jan Fabry
+2  A: 

They're both going to be so close that it doesn't matter. If you feel have to ask this question on SO, you're over-optimizing. Use whichever one makes the most logical sense.

Dave Markle
@Dave Thanks for the comment.
Devner
+1  A: 

If you specify the types BOOL or BOOLEAN as a column type when creating a table in MySQL, it creates the column type as TINYINT(1). Presumably this is the faster of the two.

Documentation

Also:

We intend to implement full boolean type handling, in accordance with standard SQL, in a future MySQL release.

R. Bemrose
+1 Thanks for the comment and reference.
Devner
+1  A: 

While my hunch is that an index on a TINYINT would be faster than an index on a CHAR(1) due to the fact that there is no string-handling overhead (collation, whitespace, etc), I don't have any facts to back this up. My guess is that there isn't a significant performance difference that is worth worrying about.

However, because you're using PHP, storing as a TINYINT makes much more sense. Using the 1/0 values is equivalent to using true and false, even when they are returned as strings to PHP, and can be handled as such. You can simply do a if ($record['field']) with your results as a boolean check, instead of converting between 'y' and 'n' all the time.

zombat
+1 @Zombat That makes sense. I think using numbers would really ease up the processing with PHP code within the app.
Devner
A: 
 TINYINT    1 Byte
CHAR(M)     M Bytes, 0 <= M <= 255

is there any different?

streetparade
+3  A: 
                       Rate insert tinyint(1) insert char(1) insert enum('y', 'n')
insert tinyint(1)     207/s                --            -1%                  -20%
insert char(1)        210/s                1%             --                  -19%
insert enum('y', 'n') 259/s               25%            23%                    --
                       Rate insert char(1) insert tinyint(1) insert enum('y', 'n')
insert char(1)        221/s             --               -1%                  -13%
insert tinyint(1)     222/s             1%                --                  -13%
insert enum('y', 'n') 254/s            15%               14%                    --
                       Rate insert tinyint(1) insert char(1) insert enum('y', 'n')
insert tinyint(1)     234/s                --            -3%                   -5%
insert char(1)        242/s                3%             --                   -2%
insert enum('y', 'n') 248/s                6%             2%                    --
                       Rate insert enum('y', 'n') insert tinyint(1) insert char(1)
insert enum('y', 'n') 189/s                    --               -6%           -19%
insert tinyint(1)     201/s                    7%                --           -14%
insert char(1)        234/s                   24%               16%             --
                       Rate insert char(1) insert enum('y', 'n') insert tinyint(1)
insert char(1)        204/s             --                   -4%               -8%
insert enum('y', 'n') 213/s             4%                    --               -4%
insert tinyint(1)     222/s             9%                    4%                --

it seems that, for the most part, enum('y', 'n') is faster to insert into.

                       Rate select char(1) select tinyint(1) select enum('y', 'n')
select char(1)        188/s             --               -7%                   -8%
select tinyint(1)     203/s             8%                --                   -1%
select enum('y', 'n') 204/s             9%                1%                    --
                       Rate select char(1) select tinyint(1) select enum('y', 'n')
select char(1)        178/s             --              -25%                  -27%
select tinyint(1)     236/s            33%                --                   -3%
select enum('y', 'n') 244/s            37%                3%                    --
                       Rate select char(1) select tinyint(1) select enum('y', 'n')
select char(1)        183/s             --              -16%                  -21%
select tinyint(1)     219/s            20%                --                   -6%
select enum('y', 'n') 233/s            27%                6%                    --
                       Rate select tinyint(1) select char(1) select enum('y', 'n')
select tinyint(1)     217/s                --            -1%                   -4%
select char(1)        221/s                1%             --                   -2%
select enum('y', 'n') 226/s                4%             2%                    --
                       Rate select char(1) select tinyint(1) select enum('y', 'n')
select char(1)        179/s             --              -14%                  -20%
select tinyint(1)     208/s            17%                --                   -7%
select enum('y', 'n') 224/s            25%                7%                    --

Selecting also seems to be the enum. Code can be found here

gms8994
+1 @gms8994 Thank you very much for the stats. Gives more insight into the speed. Will that be possible for you to let us know if there's any other tool as well to produce the same results as the above? Thanks again.
Devner
@Devner There're none that I know of. I wrote this one specifically for use with this question, but you can check the GitHub page linked in the response for it.
gms8994