views:

125

answers:

10

Take the following create table statement:

create table fruit
{
  count int,
  name varchar(32),
  size float
}

Instead of those specific data types, why not have "string", "number", "boolean" or better yet, not having to specify any data types at all.

What are the technical reasons for having such specific data types? (as opposed to generic or no data type)

+3  A: 

It sets a strategy for sorting and indexing, as well as enforce data integrity.

Imagine this.

MyNumberField as generic: "1234", 13, 35, "1234afgas"

Why are some of those strings and why is there letters in "1234afgas"?

With the type constraints those wouldn't be allowed.

Daniel A. White
No impact to indexing, yes to sorting. Biggest reason is data integrity.
OMG Ponies
+3  A: 

because there is a different in size and storage

tinyint = 1 byte

smallint = 2 bytes

int = 4 bytes

bigint = 8 bytes

so if you know you only need to store up to a certain range there is no need to use bigint and incur overhead of storing extra bytes per row

same holds for strings (char, varchar etc etc)

also built in constraints...can't store the letter A in an int...data will be clean..

SQLMenace
+4  A: 

Imagine 20 millions rows in a table, with an int column where all the numbers are 1 through 10.

If you used a tinyint for that, it would take 1 byte. If you used a regular int, it would take 4 bytes. That's four times the amount of disk space, 60 MBs more disk space.

Theoretically, you could design a database engine to "smart config" a table, but imagine our theoretical table where all of a sudden the database decides it need to allocate more bytes for the data in the column. The whole table would need to be re-paged, and the performance would slow to a crawl for potentially hours while the engine restructured the table. There are so many edge cases and ways to get it wrong, that it would be more work to stay on top of automatic configuration than to just design your application properly in the first place.

womp
Gigs or Megs? should be 58 megs no?
SQLMenace
Calculator, why do you mock me! Thanks for the spot ;)
womp
@SQLMenace, yes you are correct: 3B*20,000,000 = 60,000,000B = 60,000KB = 60MB (58,593.75KiB, 57.22MiB)
mynameiscoffey
+2  A: 

Not only are you telling the database system how you are going to use the data: string, boolean, number. You are also telling the database which internal representation to use. This is important for space, indexing, and performance reasons.

rayd09
+1  A: 

Aside to what's already been said, there are databases that do not require data types, such as SQLite (http://www.sqlite.org/).

gmagana
Interesting, and why doesn't this database require data types?
really? what about this? http://www.sqlite.org/datatype3.html
SQLMenace
@SQLMenace: Did you read the page you link to? It says this: *Any column in an SQLite version 3 database, except an INTEGER PRIMARY KEY column, may be used to store a value of any storage class.*
gmagana
@roygbiv: See the link @SQLMenace gave, it does a good explanation of the whole idea. Basically, it's so the DB can be more flexible. Though, as database designer, the idea of different data types in the same column makes me cringe a little, SQLite *does* support it. I can see how that's good for prototyping, but not so good for a real app for the reasons other people in this thread already stated.
gmagana
Infact, in SQLite you can run this command successfully: *CREATE TABLE mytable (field1, field2, field3)* <-- No data type definitions at all.
gmagana
SQLite is supposed to be either for embedded use and/or to replace things that would otherwise be stored in a text file. Anyone who considers using it to replace a 'real' database use case is nuts.
Tyler McHenry
Hmmm nobody's saying it's fit for 'real' use. The talk was about data types...
gmagana
A: 

There are databases out there that do not type, the one that comes to mind is IBM's Universe DB (aka Pick). With that Db all fields are of the string type and you define how they are used via a "dictionary".

Having used both strongly typed DB's and Universe extensively, I'm partial to the strongly typed ones froma programming standpoint.

Bill
A: 

The same basis of question could be asked of any type anywhere. Why have types in classes? It is a limitation and expectation of data. You expect to get x type so you can deal with x type. You don't want to deal with the infinite possibility and do lots of type checking every time you deal with a piece of data.

The types whether primitive or created type are there to define the structure that is being held. It is saying that N is a type X and you can do all the things that type X can do.

You are saying, for instance, I am dealing with an integer than can be in a certain range of numbers -X to X vs a big integer which can be in a larger range of numbers -Z to Z. (as a specific example). Usage expectations will fall in those ranges.

You also, as others have mentioned, defining how to store the information at a lower level. Event saying you have an integer is somewhat of an abstraction away from the machine.

Arthur Thomas
+2  A: 

To add to what everyone else has posted there is also a huge issue with data integrity. Imagine you stored the value "1" into the database, should this be treated as TRUE, a numeric value of 1, a string "1"...

if two columns have a value of "1", does col1 + col2 equal numeric 2 or string "11"?

mynameiscoffey
A: 

Aside from storage, a certain datatype is also a type of constraint If you know for instance a certain account number will hold exactly 8 chars, defining that in the type is the most logical and performant thing you can do. (nchar(8) for example)

You are setting the domain (or a part of it, it can be further refined by other constraints) immediately in the field's type that way.

Peter
A: 

One of the primary functions of a database is to be able to perform operations on huge amounts of data efficiently. Being very specific about data types increases the number of things that the database engine can assume about the data it's storing. Therefore, it has to perform fewer calculations and runs faster, and it can avoid allocating storage that it won't need which makes the database smaller and therefore faster still.

One of the other primary functions of a database is to ensure data integrity. The more exactly you specify what sort of data should be stored in a field, the less likely you are to accidentally store the wrong data there. It's analogous to why your C compiler is so picky about the code you write: you should much prefer to deal with compile-time errors than run-time errors.

Tyler McHenry