views:

321

answers:

2

I'm using MS SQL Server.

When I define the database schema I define a (non-materialized) view, which includes many fields, for example as follows (where "Topic" is the name of a table, and the view is a self-join on the Topic table):

CREATE VIEW View_Topic_Ancestor AS
SELECT
    Subordinate.Id AS Subordinate_Id,
    Subordinate.Folder_Id AS Subordinate_Folder_Id,
    Subordinate.topicTitle AS Subordinate_topicTitle,
    Subordinate.topicXhtml AS Subordinate_topicXhtml,
    Subordinate.crossLinked AS Subordinate_crossLinked,
    Superior.Id AS Superior_Id,
    Superior.topicTitle AS Superior_topicTitle,
    Superior.topicXhtml AS Superior_topicXhtml,
    Superior.crossLinked AS Superior_crossLinked
FROM Topic AS Subordinate LEFT OUTER JOIN Topic AS Superior
ON Superior.Folder_Id = Subordinate.Folder_Id
AND 
Superior.LeftValue = (SELECT MAX(Ancestor.LeftValue)
    FROM Topic AS Ancestor
    WHERE Subordinate.LeftValue > Ancestor.LeftValue
    AND Subordinate.LeftValue < Ancestor.RightValue
    AND Subordinate.Folder_Id = Ancestor.Folder_Id)

Later (at run-time) I use this view in a select statement, like this:

SELECT
    T.Id AS Shared_Id,
    V.Superior_Id,
    V.Superior_topicTitle,
    V.Subordinate_Id,
    V.Subordinate_Folder_Id,
    V.Subordinate_topicXhtml
FROM Topic AS T, View_Topic_Ancestor AS V
WHERE Folder_Id='e2eb2b68-738d-49ad-9787-a1e655b7973f'
AND T.crossLinked = V.Subordinate_Id

This SELECT statement doesn't reference (doesn't select) many of the fields which are in the view: for example, it selects the Subordinate_topicXhtml field but it doesn't select the Superior_topicXhtml field.

My questions are:

1) Do the fields which are defined in the view, but which are not referenced in the run-time selection from the view, have much effect on performance? Assume if you will that the Superior_topicXhtml field contains a lot of data (is a very long string).

2) How can I verify the answer to this myself? Is testing (measuring ellapsed time with a stop-watch) the only way, or is it possible to obtain an answer based on theory? I am using "Microsoft Server SQL Management Studio" for Microsoft SQL Server 2008, with SQL Express. I see how to obtain (but haven't learned how to interpret) the "estimated execution plan" for this query, but this shows only what indexes and loops are happening, not whether data is being retrieved from unreferenced fields.

+3  A: 

The Query optimizer creates a query plan by combining the sql which defines the view with the sql you construct to "select" from the view into a single sql query. If, in this combined, optimized sql, any specific attribute (column) is not necessary, it will not be used.. specifically, if the final select does not require it, it will not be sent over the wire. Also, if it is not needed by any Where clause, or Order By or anything else within the optimized sql, and all the attributes that ARE needed happen to be in an index, then that query can read only the index, and not touch the table at all... even thought he view references a column not present in the index... You can see this by looking at the query execution plan for the query.

Charles Bretana
I think you're right: I didn't imagine that values of unreferenced fields would come over wire. What I didn't know was whether they'd be read into memory within the SQL server.
ChrisW
and the answer is.. No they're not. Even intermediate result sets, generated for joins, will skip those unneeded columns. The optimizer will ignore them... But remember, even if one column which exists only in the table itself is needed for some reason, the whole row must be "read" from disk...
Charles Bretana
When you say "the whole row must be read from disk", does that include fields of type "nvarchar(max)"? I had guessed that such field values aren't stored inline (within the same page as) fixed-length rows/records, and aren't necessarily and implicitly retrieved when the rest of the row is retrieved?
ChrisW
Yes, if the query processer has to read anything from the table, it reads the entire row. This is because the smallest piece of data it can read is a single 8kbyte "page" of disk space, and rows cannot span across pages.
Charles Bretana
Chris, sorry, I missed yr commnt about not storing varChar(max) fields inline... If that is so, then that is exception... I didn't know that...
Charles Bretana
A: 

When you have the query's Execution plan displayed in the Management Studio, you can use the mouse to hover over nodes in the execution plan.

When you hover, a large tool-tip is displayed: which includes an "Output List" of the fields retrieved by that node, and an "Estimated Number of Rows" fields which shows how many instances of those fields are read.

ChrisW