ansaurus

Question

Why is ORM considered good but "select *" considered bad?

Answer 1

+1 A:

SELECT * is not bad. Did you ask whoever considered it to be bad "why?".

Ali A 2008-11-15 00:25:42

I'll ask myself: Is select * bad? Yes, if you only need columns A and B, don't pay the cost of fetching C and D columns also. Fetch only what you need.

Corey Trager 2008-11-15 00:36:23

Yes, but it's not universally "bad". It is "bad" in some cases! What if you later want C and D, should you not get them and cache them? No case is cut and dry, and so we certainly cannot say "Select * is bad". We can say "Select * is sometimes bad"

Ali A 2008-11-15 01:01:19

The worst consequence of fetching C and D is that whoever comes after you has no idea what you intended to do with them. Put another way, here's also a YAGNI principle that this violates.

le dorfier 2008-11-15 01:26:39

select * is bad in stored procedures because it prevents the server from being able to pre-compile an execution plan.

Gordon Bell 2008-11-15 05:03:25

Answer 2

+2 A:

Even ORMs need to avoid SELECT * to be effective, by using lazy loading etc.

And yes, SELECT * is generally a bad idea if you aren't consuming all the data.

So, do you have different kinds of MyThing objects, one for each column combo? – Corey Trager (Nov 15 at 0:37)

No, I have read-only digest objects (which only contain important information) for things like lookups and massive collections and convert these to fully hydrated objects on demand. – Cade Roux (Nov 15 at 1:22)

Cade Roux 2008-11-15 00:28:39

So, do you have different kinds of MyThing objects, one for each column combo?

Corey Trager 2008-11-15 00:37:02

No, I have read-only digest objects (which only contain important information) for things like lookups and massive collections and convert these to fully hydrated objects on demand.

Cade Roux 2008-11-15 01:22:18

+1 for you response comment Cade, this bit about digest objects should be in the answer. :)

BobbyShaftoe 2008-12-14 06:00:24

Answer 3

+3 A:

I am not sure why you would want a partially hydrated object. Given a class of Customer with properties of Name, Address, Id. I would want them all to create a fully populated Customer object.

The list hanging off of Customers called Orders can be lazily loaded when accessed though most ORMs. And NHibernate anyway allows you to do projections into other objects. So if you had say a simply customer list where you displayed the ID and Name, you can create an object of type CustomerListDisplay and project your HQL query into that object set and only obtain the columns you need from the database.

Friends don't let friends premature optimize. Fully hydrate your object, lazy load it's associations. And then profile your application looking for problems and optimize the problem areas.

NotMyself 2008-11-15 00:29:27

A partially hydrated object in those cases where you are only interested in a subset of the columns, and you don't want to pay the performance penalty of fetching all the columns.

Corey Trager 2008-11-15 00:35:08

so then look into projections...

NotMyself 2008-11-15 00:37:10

often the 'performance hit' of fetching all columns is insignificant. The performance hit of repeated round trips to the DB is very significant. Lazy loading will only help in some scenarios, its not a magic bullet.

gbjbaanb 2008-11-15 01:03:44

@gbjbaanb : it depends how deep the relationship heirarchy goes. It might not just be columns in a single table, but a sequence of joins to other tables.

Mitch Wheat 2008-11-15 01:11:23

Yes, but when you start talking "relationship hierarchy", you've moved past "fully hydrate your object" and into "lazy load its associations", unless you're talking specifically about deep table-per-subclass inheritance schemes, but most ORMs I've seen seem to discourage use of inheritance anyhow...

Dave Sherohman 2008-11-15 02:00:23

Answer 4

+1 A:

ORMs in general do not rely on SELECT *, but rely on better methods to find columns like defined data map files (Hibernate, variants of Hibernate, and Apache iBATIS do this). Something a bit more automatic could be set up by querying the database schema to get a list of columns and their data types for a table. How the data gets populated is specific to the particular ORM you are using, and it should be well-documented there.

It is never a good idea to select data that you do not use at all, as it can create a needless code dependency that can be obnoxious to maintain later. For dealing with data internal to the class, things are a bit more complicated.

A short rule would be to always fetch all the data that the class stores by default. In most cases, a small amount of overhead won't make a huge difference, so your main goal is to reduce maintenance overhead. Later, when you performance profiling of the code, and have reason to believe that it may benefit from adjusting the behavior, that is the time to do it.

If I saw an ORM make SELECT * statements, either visibly or under its covers, then I would look elsewhere to fulfill my database integration needs.

Adam K. Johnson 2008-11-15 00:37:56

I don't find that your answer addresses the question. Even so ORM may employ methods to have knowledge of the data requested from the database, how else are you to conclusively populate your ORM object model without a SELECT statement whether it is a prepared statement or something else?

Florin 2008-11-15 00:44:31

How the data is loaded into the class is an implementation-specific detail of the ORM. In iBATIS for example, it has XML files that contain a SQL statements or procedures, column data types, and then column-to-class mappings to specify which column matches which specific variable in the class.

Adam K. Johnson 2008-11-15 00:50:42

@Florin, I understand you. @doktaru, I don't understand you. Forget the "select *" part of the question. Instead, explain how MyThing behaves when sometimes its fully hydrated and sometimes partially.

Corey Trager 2008-11-15 00:58:18

For me the essential difference is whether the class drives the interface, or the table drives the interface. If the class wants columns the table doesn't yet have, or then the table probably needs adjustment (or a join or ..) The class shouldn't ever deal with columns it doesn't require or know.

le dorfier 2008-11-15 01:10:29

Answer 5

+3 A:

There are two separate issues to consider.

To begin, it is quite common when using an ORM for the table and the object to have quite different "shapes", this is one reason why many ORM tools support quite complex mappings.

A good example is when a table is partially denormalised, with columns containing redundant information (often, this is done to improve query or reporting performance). When this occurs, it is more efficient for the ORM to request just the columns it requires, than to have all the extra columns brought back and ignored.

The question of why "Select *" is evil is separate, and the answer falls into two halves.

When executing "select *" the database server has no obligation to return the columns in any particular order, and in fact could reasonably return the columns in a different order every time, though almost no databases do this.

Problem is, when a typical developer observes that the columns returned seem to be in a consistent order, the assumption is made that the columns will always be in that order, and then you have code making unwarranted assumptions, just waiting to fail. Worse, that failure may not be fatal, but may simply involve, say, using Year of Birth in place of Account Balance.

The other issue with "Select *" revolves around table ownership - in many large companies, the DBA controls the schema, and makes changes as required by major systems. If your tool is executing "select *" then you only get the current columns - if the DBA has removed a redundant column that you need, you get no error, and your code may blunder ahead causing all sorts of damage. By explicitly requesting the fields you require, you ensure that your system will break rather than process the wrong information.

Bevan 2008-11-15 00:38:10

Bevan - What does that mean concretely, that an "object" has different "shapes"? Like, if the language is Java/C#/C++, is there just one "class MyThing" but sometimes its GetSomeStringAttr() returns null?

Corey Trager 2008-11-15 00:51:10

To me it sounds like an Object only finally has a sharply defined definition with respect to a certain context; and different contexts in the application may need different shapes (sets of properties and methods) for different contexts.

le dorfier 2008-11-15 01:37:56

Bevan 2008-11-16 05:42:19

Answer 6

+38 A:

In my limited experience, things are as you describe--it's a messy situation and the usual cop-out "it depends" answer applies.

A good example would be the online store that I work for. It has a Brand object, and on the main page of the Web site, all of the brands that the store sells are listed on the left side. To display this menu of brands, all the site needs is the integer BrandId and the string BrandName. But the Brand object contains a whole boatload of other properties, most notably a Description property that can contain a substantially large amount of text about the Brand. No two ways about it, loading all of that extra information about the brand just to spit out its name in an unordered list is (1) measurably and significantly slow, usually because of the large text fields and (2) pretty inefficient when it comes to memory usage, building up large strings and not even looking at them before throwing them away.

One option provided by many ORMs is to lazy load a property. So we could have a Brand object returned to us, but that time-consuming and memory-wasting Description field is not until we try to invoke its get accessor. At that point, the proxy object will intercept our call and suck down the description from the database just in time. This is sometimes good enough but has burned me enough times that I personally don't recommend it:

It's easy to forget that the property is lazy-loaded, introducing a SELECT N+1 problem just by writing a foreach loop. Who knows what happens when LINQ gets involved.
What if the just-in-time database call fails because the transport got flummoxed or the network went out? I can almost guarantee that any code that is doing something as innocuous as string desc = brand.Description was not expecting that simple call to toss a DataAccessException. Now you've just crashed in a nasty and unexpected way. (Yes, I've watched my app go down hard because of just that. Learned the hard way!)

So what I've ended up doing is that in scenarios that require performance or are prone to database deadlocks, I create a separate interface that the Web site or any other program can call to get access to specific chunks of data that have had their query plans carefully examined. The architecture ends up looking kind of like this (forgive the ASCII art):

Web Site:         Controller Classes
                     |
                     |---------------------------------+
                     |                                 |
App Server:       IDocumentService               IOrderService, IInventoryService, etc
                  (Arrays, DataSets)             (Regular OO objects, like Brand)
                     |                                 |
                     |                                 |
                     |                                 |
Data Layer:       (Raw ADO.NET returning arrays, ("Full cream" ORM like NHibernate)
                   DataSets, simple classes)

I used to think that this was cheating, subverting the OO object model. But in a practical sense, as long as you do this shortcut for displaying data, I think it's all right. The updates/inserts and what have you still go through the fully-hydrated, ORM-filled domain model, and that's something that happens far less frequently (in most of my cases) than displaying particular subsets of the data. ORMs like NHibernate will let you do projections, but by that point I just don't see the point of the ORM. This will probably be a stored procedure anyway, writing the ADO.NET takes two seconds.

This is just my two cents. I look forward to reading some of the other responses.

Nicholas Piasecki 2008-11-15 00:39:51

one of the high quality answers.. +1

petr k. 2008-11-15 01:01:26

yeah, no doubt. +1 just for the dang ASCII art architecture diagram, lol. Very nice indeed.

chadmyers 2008-11-15 01:05:21

+1. Very good points raised.

Mitch Wheat 2008-11-15 01:06:55

I'd vote this answer up twice if I could!

Mitch Wheat 2008-11-15 01:07:57

I think collections of codes, short-descriptions, etc. are perfectly defensible as legitimate classes (subclassing a collection) on their own merits, and there's a conceptually legitimate reason for calling them out.

le dorfier 2008-11-15 01:33:26

good and solid practical advice. *LOVE* the ASCII art.

Booji Boy 2008-11-15 15:29:03

Answer 7

A:

Select * is bad for a couple of reasons:

Some columns might be costly to retrieve, so if you don't need them you're wasting time
If you write code with select * and you're expecting x number of columns in a given order, and the table changes, you could potentially break the code. If you explicitly ask for the columns you need then the code is less fragile.

Cameron MacFarland 2008-11-15 00:42:54

Sorry, the question was poorly worded. Read Nicholas's answer who understood my question exactly in the way I intended it. How would you respond to him?

Corey Trager 2008-11-15 00:59:06

Answer 8

+6 A:

People use ORM's for greater development productivity, not for runtime performance optimization. It depends on the project whether it's more important to maximize development efficiency or runtime efficiency.

In practice, one could use the ORM for greatest productivity, and then profile the application to identify bottlenecks once you're finished. Replace ORM code with custom SQL queries only where you get the greatest bang for the buck.

SELECT * isn't bad if you typically need all the columns in a table. We can't generalize that the wildcard is always good or always bad.

edit: Re: doofledorfer's comment... Personally, I always name the columns in a query explicitly; I never use the wildcard in production code (though I use it when doing ad hoc queries). The original question is about ORMs -- in fact it's not uncommon that ORM frameworks issue a SELECT * uniformly, to populate all the fields in the corresponding object model.

Executing a SELECT * query may not necessarily indicate that you need all those columns, and it doesn't necessarily mean that you are neglectful about your code. It could be that the ORM framework is generating SQL queries to make sure all the fields are available in case you need them.

Bill Karwin 2008-11-15 00:46:38

You completely understood the intent of my question. Thanks.

Corey Trager 2008-11-15 01:00:17

I think using "SELECT *" indicates your decision that you do, in fact, need all the columns in the table. If that's not true, then it's misleading. It also suggests the possibility that you didn't think about it very hard (unless you leave a comment asserting.)

le dorfier 2008-11-15 01:42:21

Answer 9

+2 A:

Linq to Sql, or any implementation of IQueryable, uses a syntax which ultimately puts you in control of the selected data. The definition of a query is also the definition of its result set.

This neatly avoids the select * issue by removing data shape responsibilities from the ORM.

For example, to select all columns:

from c in data.Customers
select c

To select a subset:

from c in data.Customers
select new
{
  c.FirstName,
  c.LastName,
  c.Email
}

To select a combination:

from c in data.Customers
join o in data.Orders on c.CustomerId equals o.CustomerId
select new
{
  Name = c.FirstName + " " + c.LastName,
  Email = c.Email,
  Date = o.DateSubmitted
}

Bryan Watts 2008-11-15 00:54:53

But.... what the heck is the object you have created? Are they both "Customer" domain objects, even though they have different properties?

Corey Trager 2008-11-15 01:01:22

The new object is an anonymous type, generated by the compiler with the defined schema. It's as if you had defined it yourself, except you don't have to manage the class artifact. See http://msdn.microsoft.com/en-us/library/bb397696.aspx for more information.

Bryan Watts 2008-11-15 01:21:44

The object represents a projection, which by definition is not a domain object and cannot be confused with one. If you need a full customer, select that, otherwise select something else. "Sparsely populated objects" seem like an anti-pattern which compromises the domain for technical concerns.

Bryan Watts 2008-11-15 01:25:58

And you end up with technical concerns trumping domain concerns, and the application is being designed from technology back to the users rather than properly in the other direction. But maybe the programmers are a little happier.

le dorfier 2008-11-15 01:28:22

Those are the growing pains of paradigm shifts. It takes strong developers to flex the IQueryable abstraction. The point is that query definitions can be part of the domain, not just a "please implement this within your application" hook point.

Bryan Watts 2008-11-15 01:43:23

Answer 10

A:

SELECT * is a strong indication you don't have design control over the scope of your application and its modules. One of the major difficulties in cleaning up someone else's work is when there is stuff in there that is for no purpose, but no indication what is needed and used, and what isn't.

Every piece of data and code in your application should be there for a purpose, and the purpose should be specified, or easily detected.

We all know, and despise, programmers who don't worry too much about why things work, they just like to try stuff until the expected things happen and close it up for the next guy. SELECT * is a really good way to do that.

le dorfier 2008-11-15 01:15:56

Answer 11

+2 A:

The case you describe is a great example of how ORM is not a panacea. Databases offer flexible, needs-based access to their data primarily through SQL. As a developer, I can easily and simply get all the data (SELECT *) or some of the data (SELECT COL1, COL2) as needed. My mechanism for doing this will be easily understood by any other developer taking over the project.

In order to get the same flexibility from ORM, a lot more work has to be done (either by you or the ORM developers) just to get you back to the place under the hood where you're either getting all or some of the columns from the database as needed (see the excellent answers above to get a sense of some of the problems). And all this extra stuff is just more stuff that can fail, making an ORM system intrinsically less reliable than straight SQL calls.

This is not to say that you shouldn't use ORM (my standard disclaimer is that all design choices have costs and benefits, and the choice of one or the other just depends) - knock yourself out if it works for you. I will say that I truly don't understand the popularity of ORM, given the amount of extra un-fun work it seems to create for its users. I'll stick with using SELECT * when (wait for it) I need to get every column from a table.

MusiGenesis 2008-11-15 01:36:02

Answer 12

A:

If you feel the need to encapsulate everything within an object, but need something with a small subset of what is contained within a table - define your own class. Write straight sql (within or without the ORM - most allow straight sql to circumvent limitations) and populate your object with the results.

However, I'd just use the ORMs representation of a table in most situations unless profiling told me not to.

Josh Smeaton 2008-11-15 15:14:53

ansaurus

tags:

views:

answers:

Why is ORM considered good but "select *" considered bad?

related questions