views:

341

answers:

6

Presuming that the spaces are not important in a field's data, is it good practice to trim off the spaces when inserting, updating or selecting data from the table ?

I imagine different databases implement handling of spaces differently, so to avoid that headache, I'm thinking I should disallow leading and trailing spaces in any field data.

What do you think?

A: 

For typical data enty it's not worth the overhead. Is there some reason you think you are going to get lots of extra blank lines? If you are then it might be a good idea to trim to keep DB size down but otherwise no.

Dennis Baker
One reason I want to trim: When creating a user account (and other "objects"), I don't want a user indaverdently typing in a trailing space, and then not being able to login. Apart from being a "annoyance" (if you can call it that), it would generate unnecessary customer support calls. I want to apply this "theory" to all "objects" that have a "name" (virtuall all objects have a name) in a software I'm developing.
Liao
Correction to comment : user would type "john" but while creating the user account, he had accidentally typed in "john ". Of course, I could warn at time of user account creation, but this is something easy for John to forget since the whitespace isn't a "visible" character.
Liao
You stated in your first sentence "Presuming that the spaces are not important in a field's data...". If the field is something that is going to be used for user authentication then quite clearly the spaces ARE important in the fields data. Your comment suggests an entirely different scenario than your question does.Further generally username fields usually follow validation rules which go way beyond simple trimming.
Dennis Baker
@Dennis - i'm designing my own security framework, and the usernames will come directly from a user, not from any other source; so i get to control this aspect with regard to user names; and yes, there will be other checks for the contents of the username.
Liao
@Dennis - the spaces are important for passwords i suppose, but not for usernames imo
Liao
+2  A: 

If leading and trailing spaces are unimportant, then I'd trim them off before inserting or updating. There should then be no unnecessary spaces on a select.

This brings some advantages. Less space required in a row means that potentially more rows can exist in a data page which leads to faster data retrieval (less to retrieve). Also, you are not constantly trimming data on SELECTs. (Uses the DRY [don't repeat yourself] principle here)

Colin Mackay
+2  A: 

I would say it's a good practice in most scenarios. If you can confidently say that data is worthless, and the cost of removing it is minimal, then remove it.

Donnie DeBoer
+2  A: 

I think it is a good practice. There are few things more soul crushing than spending an hour, a day, or any amount of time, chasing down a bug that ultimately was caused by a user typing an extra space. That extra space can cause reports to go subtly wrong, or can cause an exception somewhere in your program, and unless you have put brackets around every print statement in your logs and error messages, you might not realize that it is there. Even if you religiously trim spaces before using data you've pulled from the db, do future users of your data a favor and trim before putting it in.

Peter Recore
Is there some "guaranteed" way to trim the spaces at the database - so that even if someone runs an insert from outside of my "code", say by running a direct SQL insert in the database IDE, the database can trim the whitespace. A trigger perhaps is possible?
Liao
A: 

I would trim them (unless you are actually using the whitespace data), simply because it is easy to do, and spaces are particularly hard to spot if they do cause problem in your code.

Nico Burns
A: 

Trailing spaces are particularly problematic, specifically in regards ANSI_NULLS behaviour.

For instance, colname = '1' can return true where colname like '1' returns false

Thus, given trailing spaces in varchar columns are ambiguous, truncation is most likely preferable, particularly because there is no real information in such data and it creates ambiguity in the behaviour of SQL Server.

For example, look at the discussion at this question:

http://stackoverflow.com/questions/1143313/why-would-sqlserver-select-statement-select-rows-which-match-and-rows-which-match/1143324#1143324

polyglot