views:

101

answers:

2

I'm learning about the usage of datatypes for databases.

For example:

  • Which is better for email? varchar[100], char[100], or tinyint (joking)
  • Which is better for username? should I use int, bigint, or varchar? Explain. Some of my friends say that if we use int, bigint, or another numeric datatype it will be better (facebook does it). Like u=123400023 refers to user 123400023, rather then user=thenameoftheuser. Since numbers take less time to fetch.
  • Which is better for phone numbers? Posts (like in blogs or announcments)? Or maybe dates (I use datetime for that)? maybe some have make research that would like to share.
  • Product price (I use decimal(11,2), don't know about you guys)?
  • Or anything else that you have in mind, like, "I use serial datatype for blablabla".

Why do I mention innodb specifically?

Unless you are using the InnoDB table types (see Chapter 11, "Advanced MySQL," for more information), CHAR columns are faster to access than VARCHAR.

Inno db has some diffrence that I don't know. I read that from here.

+8  A: 

Brief Summary:

(just my opinions)

  1. for email address - VARCHAR(255)
  2. for username - VARCHAR(100) or VARCHAR(255)
  3. for id_username - use INT (unless you plan on over 2 billion users in you system)
  4. phone numbers - INT or VARCHAR or maybe CHAR (depends on if you want to store formatting)
  5. posts - TEXT
  6. dates - DATE or DATETIME (definitely include times for things like posts or emails)
  7. money - DECIMAL(11,2)
  8. misc - see below

As far as using InnoDB because VARCHAR is supposed to be faster, I wouldn't worry about that, or speed in general. Use InnoDB because you need to do transactions and/or you want to use foreign key constraints (FK) for data integrity. Also, InnoDB uses row level locking whereas MyISAM only uses table level locking. Therefore, InnoDB can handle higher levels concurrency better than MyISAM. Use MyISAM to use full-text indexes and for somewhat less overhead.

More importantly for speed than the engine type: put indexes on the columns that you need to search on quickly. Always put indexes on your ID/PK columns, such as the id_username that I mentioned.

More details:

Here's a bunch of questions about MySQL datatypes and database design (warning, more than you asked for):

And a couple questions on when to use the InnoDB engine:

I just use tinyint for almost everything (seriously).

Edit - How to store "posts:"

Below are some links with more details, but here's the short version. For storing "posts," you need room for a long text string. CHAR max length is 255, so that's not an option, and of course CHAR would waste unused characters versus VARCHAR, which is variable length CHAR.

Prior to MySQL 5.0.3, VARCHAR max length was 255, so you'd be left with TEXT. However, in newer versions of MySQL, you can use VARCHAR or TEXT. The choice comes down to preference, but there are a couple differences. VARCHAR and TEXT max length is now both 65,535, but you can set you own max on VARCHAR. Let's say you think your posts will only need to be 2000 max, you can set VARCHAR(2000). If you every run into the limit, you can ALTER you table later and bump it to VARCHAR(3000). On the other hand, TEXT actually stores its data in a BLOB (1). I've heard that there may be performance differences between VARCHAR and TEXT, but I haven't seen any proof, so you may want to look into that more, but you can always change that minor detail in the future.

More importantly, searching this "post" column using a Full-Text Index instead of LIKE would be much faster (2). However, you have to use the MyISAM engine to use full-text index because InnoDB doesn't support it. In a MySQL database, you can have a heterogeneous mix of engines for each table, so you would just need to make your "posts" table use MyISAM. However, if you absolutely need "posts" to use InnoDB (for transactions), then set up a trigger to update the MyISAM copy of your "posts" table and use the MyISAM copy for all your full-text searches.

See bottom for some useful quotes.

(3) "Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions.

Before MySQL 5.0.3, if you need a data type for which trailing spaces are not removed, consider using a BLOB or TEXT type.

When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.

Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values."

Lastly, here's a great post about the pros and cons of VARCHAR versus TEXT. It also speaks to the performance issue:

JohnB
how about the post ? 1 for = "thelongpost" ? , 2= "the2ndlongpost" :).
Adam Ramadhan
Sorry Adam, I thought I had included another link that answered your question. Well, please see my edit for storing "posts."
JohnB
Shoot, I forgot to mention than InnoDB doesn't support full-text index. You have to use MyISAM. Please re-read my section on that.
JohnB
yup im updateing too please see :)
Adam Ramadhan
i mean my last edit is offtopic and i now move it :)
Adam Ramadhan
id should not be INT, but INT UNSIGNED. Monetary data should not be DECIMAL(11,2), but DECIMAL(11,2) UNSIGNED. I'd recommend using Sphinx to index fulltext data over MySQL FULLTEXT and MyISAM.
Isotopp
+2  A: 

There are multiple angles to approach your question.

From a design POV it is always best to chose the datatype which expresses the quantity you want to model best. That is, get the data domain and data size right so that illegal data cannot be stored in the database in the first place. But that is not where MySQL is strong in the first place, and especially not with the default sql_mode (http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html). If it works for you, try the TRADITIONAL sql_mode, which is a shorthand for many desireable flags.

From a performance POV, the question is entirely different. For example, regarding the storage of email bodies, you might want to read http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ and then think about that.

Removing redundancies and having short keys can be a big win. For example, in a project that I have seen, a log table has been storing http User-Agent information. By simply replacing each user agent string in the log table with a numeric id of a user agent string in a lookup table, data set size was considerably (more than 60%) reduced. By parsing the user agent further and then storing a bunch of ids (operating system, browser type, version index) data set size was reduced to 1% of the original size.

Finally, there is a number of rules that can help you spot errors in schema design.

For example, anything that has id in the name and is not an unsigned integer type is probably a bug (especially in the context of innodb).

For example, anything that has price or cost in the name and is not unsigned is a potential source of fraud (fraudster creates article with negative price, and buys that).

For example, anything that works on monetary data and is not using the DECIMAL data type of the appropriate size is probably doing math wrong (DECIMAL is doing BCD, decimal paper math with correct precision and rounding, DOUBLE and FLOAT do not).

Isotopp