views:

701

answers:

17

Just out of curiosity and lack of a definite answer...

I was just looking at some data that is sent from a 3rd party to us, and their "Gender" field is a [0,1] denoting either a Female (0) or a Male (1).

Is there any better reason to use [0,1] over ["F","M"]?

Does it depend on the scenario and the intuitiveness between a field and its value?

Data access speeds and/or size constraints?

If the moon is full?


What are the differences in using [0,1] versus ["Y","N"] versus ["T","F"]?

+2  A: 

It really doesn't matter.

JonH
thats what im thinking, but there HAS to be good reasons to use 0 over N and/or 1 over Y.....
CheeseConQueso
Well, there doesn't have to be a good reason, sometimes people do things like this without any particular reasoning behind it, you should see the state of our legacy DB2 database!
TabbyCool
yeah i know what you mean... sometimes non-programmers think too deep about something and decide to make gender a [0,1] field instead of going on instinct and using ["F","M"]
CheeseConQueso
+2  A: 

If you are really, really, really worried about size constraints, [0,1] will save you a few bits.

dhorn
Not if they are sending the data in some kind of text format like CSV.
JohnFx
Perhaps yes, perhaps no. If you store [0,1] in an integer field (not as uncommon as you might think), then you'll actually use more space than a char(1) field would take for ['M', 'F'].
Steven M. Cherry
Probably not if you index the field. Even "MALE" and "FEMALE" should take the same space as 0 and 1.
cherouvim
As I understand it, very few (if any) databases store bits as bits in the data store. I believe it's for speed that it's actually quicker to store a 'bit' field in a whole byte (so effectively an 8-bit int which only has two values 0 or 1) rather than trying to compress multiple bit fields into a byte. So the statement that using a bit field [0,1] will save you anything isn't strictly true and, even where implemented as a bit, could actually cost you in performance.
Lazarus
Oracle doesn't even have a "bit" datatype. It's "number" with a width of 1.
Cylon Cat
Interesting stuff, I generally work with MSSQL, where I am relatively certain they store a bit as a bit.
dhorn
+20  A: 

Obfuscation

It's better to use 0 and 1 instead of F and M if you want to obfuscate your data to make it difficult for other programmers to understand it.

Otherwise, no, there's no advantage.

Gender Bender #1

Male pigs, like some other male animals, are castrated if they're going to be used for food and not breeding, because it improves the meat quality.

The application originally just tracked males and females. But now it must track three sexes: male, female, and barrow (a castrated pig). It would be a pain to change that if somebody had decided to use 0 and 1 in a bit field for the sex.

Gender Bender #2

Internationalization. If your application might find its way into other languages, consider using a "code table" to assign the values: GENDER_MALE, GENDER_FEMALE, and GENDER_NEUTER. This allows your application to look up the value that should be displayed to the user, while making the assocation in the database unquestionably clear (e.g., GENDER_FEMALE + EN + DOG = bitch; GENDER_MALE + FR + HUMAN = homme).

This will depend on your business requirements, problem domain, and time constraints. The more flexible of a solution you envision the more time you will have to spend.

Truly, the days of having to pinch every bit for all its worth are over; be explicit.

Kyralessa
+1 for delicious sarcasm.
JohnFx
thats what i was thinking.... why the eff would you go down 0,1 road for a gender field?
CheeseConQueso
I can see this resulting in a form with a gender field offering 0 and 1 as options. Choose wisely.
statenjason
+1 to that statenjason
CheeseConQueso
... with a useless amount of translation so that the form shows Male/Female labels while hiding the truth
CheeseConQueso
Naahh you can use 3 for castrated, and 4 , 5 etc for different things that may come in future ;)
OscarRyz
@Kyra - what about true/false vs. yes/no vs. 0/1 ?
CheeseConQueso
What about it? I don't think anyone would suggest that there's *no* case where a bit field is appropriate. I usually use the "screamly obvious" test: If it's not screamingly obvious how this code/database structure works, it's not finished yet.
Kyralessa
for barrows you can just use a three value bool: true,false, FILE_NOT_FOUND
Radu094
@Kyra - is any of those 3 better than the other?
CheeseConQueso
+2  A: 

Well, comparing Ints is a little bit easier than comparing Strings; when comparing strings you have to take uppercase and lowercase into account.

Esteban Araya
yeah that makes sense, but if there are only 2 options, you probably wouldnt bother using less than or greater than comparisons... or even care about uppercase/lowercase since the 2 options are 'hard-coded' and so will be any constraints on queries for them
CheeseConQueso
+3  A: 

Not unless you want to get down to the level of bits for each entry. I mean, you could fit 0 or 1 into one bit, while a character takes up 8 bits. For the most part, this isn't worth it.

I think "M" or "F" is clearer, because it provides more semantic information.

Patrick Niedzielski
+3  A: 

I'd create a user defined type in sql, or an enum class in c#/vb and store the 0,1 in the database for the previously stated reasons of size and speed.

Bill Forney
+1. Mapping integers to Enums is more natural than using "magic strings".
Heinzi
+1  A: 

The short answer is no since a single character takes the same storage space as an integer.

The long answer is it depends on how your application is written. I wrote an app once that had a gender field in the database with 0 or 1 because in the application layer I had an enum that mapped Gender.Female and Gender.Male to the 0 and 1 values, respectively.

Mark Ewer
Single character: 8 or 16 bits, or more, depending on unicode. Integer: 32 bits in most environments. 0 or 1 stored in a bit/boolean field: 1 bit in some databases, 8 bits in others. The only way you can say for sure that a number takes the same storage space as a single character is if the number is stored as a single character, and not an integer.
Joel Mueller
Agreed. Perhaps I should not have stated my response in an absolute. What I really meant was that in most database systems a single char and a short int are similar in size or can be made to take the same storage space.
Mark Ewer
+3  A: 

A major advantage is that a [0, 1, ...] column can be extended naturally if your assumptions change. Maybe the "binary" column isn't as binary as you think it is. (I worked on a database schema that had to be rewritten when same-sex marriage laws were passed, because assumptions were made about many things that were later invalidated.)

John Feminella
true.... it also allows for "custom" flags (ex: 3 = "unknown", 4 = "maybe", etc...) but it might also conflict with those normalization rules depending on how far you go into the extension of that field's possible values
CheeseConQueso
+2  A: 

The differences in performance are going to be trivial. Go with the more intuitive one for humans M/F.

JohnFx
Yeah, "Madame"/"Fellow" has the advantage of being obvious :)
Rafał Dowgird
Of course this pattern does introduce some cultural coupling in your data structure. You could also use "H/D" (Hombres, Damas)
JohnFx
A: 

Really depends on database.

  • SQL Server uses a bit
  • MySQL uses a tinyint
  • Storing T/F will be at least a char(1)
Jarrett Meyer
+7  A: 

Aren't we overlooking the obvious use -> foreign keys? I know the original question implied a bit field, but if it is truly numerical, might the 0,1 gender column refer to a gender table?

ZombieSheep
i didn't imply a bit field, but that is a good point.... just wondering (worrying) what could be added to the gender table
CheeseConQueso
We use a 4 state gender table... Male, Female, Unspecified and Unknown. I don't like the 4 state gender, but it is mandate upon us, so we have to use it. ;)
ZombieSheep
Some people would consider use of binary gender to be discriminatory. Best to use a foreign key. See, e.g., http://synecdochic.dreamwidth.org/366609.html
Alex Feinman
Here's a PDF from the US government that lists 8 possibilites for gender: http://www.aphis.usda.gov/vs/nahss/docs/surveillance_standards_appendixE_gender_codes.pdf
Mark Ransom
i want to meet a multiple gender person
CheeseConQueso
+1  A: 

Well, in SQL Server it would definitely matter. You should use a bit column type in this case (1/0 or True/False -- however you want to say it). That's just 1 bit of storage compared to 1 byte for a char(1).

Steve Wortham
+2  A: 

Curiously noboby has mentioned languages.
M/F is okay in English but what about other languages?

Then again you could always argue that another table should be used for lists.
Although here we are creating a complicated solution.

A bit (or boolean) field should only ever be used when there are definitively only 2 choices.

My two cents.

ThatBloke
+1 for mentioning other languages
Serhii
+1 for for the same reason
CheeseConQueso
when there are 2 choices, what would you use? 0/1, T/F, Y/N?
CheeseConQueso
For 2 choices T/F, Y/N are only relevant when the field is boolean.You cannot say sex = TRUE, doesn't make sense.You should always make code readability a priority.So to answer the question I would tend towards 0/1 and then use an enum or constants in the code.
ThatBloke
+1  A: 

For flags in a record, I prefer "Y"/"N" or "T"/"F" to 1/0.

If you want to phrase the flag as a question, use Y/N makes it clear that "Y" agrees answers the question positively, and "N" means a negative answer e.g.

SHOULD_SPECIAL_DISCOUNT_APPLY - Y or N

If you want to phrase the flag as a positive statement, T/F is clearer. T - means the statement is true, F means the statement is false:

SPECIAL_DISCOUNT_APPLIES - T or F

0 or 1 doesn't have a straightforward mapping to True or False - it depends which way around it's meant. You can't guarantee that '1' means True/Yes and '0' means False/No - it isn't this always that way around in electronics and in software it depends how consistent the programmer has been and how well named the fields are...

cartoonfox
i think 0 and 1 have a straightforward mapping to False/No and True/Yes.. but I do agree with naming the field in such a way where using Y/N or T/F are more intuitive
CheeseConQueso
In electronics 0 is sometimes True and 1 is False - and then in the same chip - another subsystem uses 0 to represent False and 1 as True - they call it "mixed logic". Just because **you** can come up with something straightforward doesn't mean that other people will - or that you're both agreed when you work on the same code....
cartoonfox
thats interesting... didn't know that... thanks
CheeseConQueso
Actually it's not really that way. Many times, when then say negative logic, the components have indications in the ports. For example, if you have a READ signal, and it works on negative logic, it would read "negated READ" (sorry, can't type it accordingly) on the pin, so that the signal that "enters" the pin is negated. So, if you want to READ, you should send READ = 1 (Vcc). In this case, "negated READ" = 0 (GND), so you should send 0 to the "negated READ".
Spidey
A: 

After reading all these and doing a little research, I've come to the conclusion that:

  • [0,1] field is useful because its international and can be expanded upon to include more clauses when linked to a static table of definitions.

  • ["Y","N"] and ["T","F"] are probably recognized worldwide, but are related to the English language.

  • ["M","F"] gender type fields are English-based as well and restrict usage when considering someone who doesn't want to mention their gender or whose gender is indeterminable (hermaphrodite)

CheeseConQueso
+1  A: 

Why not use an enum? That allows you to

  1. ensure that you must always use the correct type, potentially reducing errors
  2. allow the database to potentially optimize the number of bits used
  3. have human readable output for free
Steven Schlansker
+1  A: 

Since sex is not actually binary - there is a continuous range of 'intersex' conditions between male and female, as well as beings with no sex at all - it's best to use a floating-point type. 0 for female (being the default, at least in mammals), 1 for male, with intermediate values for intermediate conditions, and NaN for those with no value.

But remember, this will never be fully applicable, because there is no type for the human heart. Although complex is often a good approximation.

Tom Anderson