tags:

views:

570

answers:

7

Generally speaking, the SQL queries that I write return unformatted data and I leave it to the presentation layer, a web page or a windows app, to format the data as required. Other people that I work with, including my boss, will insist that it is more efficient to have the database do it. I'm not sure that I buy that and believe that even if there was a measurable performance gain by having the database do it, that there are more compelling reasons to generally avoid this.

For example, I will place my queries in a Data Access layer with the intent of potentially reusing the queries whenever possible. Given this, I ascertain that the queries are more likely to be able to be reused if the data remains in their native type rather than converting the data to a string and applying formatting functions on them, for example, formatting a date column to a DD-MMM-YYYY format for display. Sure, if the SQL was returning the dates as formatted strings, you could reverse the process to revert the value back to a date data type, but this seems awkward, for lack of a better word. Furtehrmore, when it comes to formatting other data, for example, a machine serial number made up of a prefix, base and suffix with separating dashes and leading zeros removed in each sub field, you risk the possibility that you may not be able to correctly revert back to the original serial number when going in the other direction. Maybe this is a bad example, but I hope you see the direction I am going with this...

To take things a step further, I see people write VERY complex SQLs because they are essentially writing what I would call presentation logic into a SQL instead of returning simple data and then applying this presentation logic in the presentation layer. In my mind, this results in very complex, difficult to maintain and more brittle SQL that is less adaptable to change.

Take the following real-life example of what I found in our system and tell me what you think. The rational I was given for this approach was that this made the web app very simple to render the page as it used the following 1-line snippet of classic ADO logic in a Classic ASP web app to process the rows returned:

  oRS.GetString ( , , "</td>" & vbCrLf & "<td style=""font-size:x-small"" nowrap>" ,"</td>" & vbCrLf & "</tr>" & vbCrLf & "<tr>" & vbCrLf & _
  "<td style=""font-size:x-small"" nowrap>" ,"&nbsp;" ) & "</td>" & vbCrLf & "</tr>" & vbCrLf & _

Here's the SQL itself. While I appreciate the author's ability to write a complex SQL, I feel like this is a maintenance nightmare. Am I nuts? The SQL is returning a list of programs that are current running against our database and the status of each:

Because the SQL did not display with CR/LFs when I pasted here, I decided to put the SQL on an otherwise empty personal Google site. Please feel free to comment. Thanks.

By the way-This SQL was actually constructed using VB Script nested WITHIN a classic ASP page, not calling a stored procedure, so you have the additional complexity of embedded concatentations and quoted markup, if you know what I mean, not to mention lack of formatting. The first thing I did when I was asked to help to debug the SQL was to add a debug.print of the SQL output and throw it through a SQL formatter that I just found. Some of the formatting was lost in pasting at the following link:

Edit(Andomar): copied inline: (external link removed, thanks-Chad)

SELECT 
Substring(Datename("dw",start_datetime),1,3) 
+ ', '
+ Cast(start_datetime AS VARCHAR) "Start Time (UTC/GMT)"
,program_name "Program Name"
,run_sequence "Run Sequence"
,CASE 
WHEN batchno = 0
THEN Char(160)
WHEN batchno = NULL
THEN Char(160)
ELSE Cast(batchno AS VARCHAR)
END "Batch #" /* ,Replace(Replace(detail_log ,'K:\' ,'file://servernamehere/DiskVolK/') ,'\' ,'/') "log"*/ /* */
,Cast('<a href="GOIS_ViewLog.asp?Program_Name=' AS VARCHAR(99))
+ Cast(program_name AS VARCHAR)
+ Cast('&Run_Sequence=' AS VARCHAR)
+ Cast(run_sequence AS VARCHAR)
+ Cast('&Page=1' AS VARCHAR)
+ ''
+ Cast('">'
+ CASE 
WHEN end_datetime >= start_datetime
THEN CASE 
WHEN end_datetime <> 'Jan 1 1900 2:00 PM'
THEN CASE 
WHEN (success_code = 10
OR success_code = 0)
AND exit_code = 10
THEN CASE 
WHEN errorcount = 0
THEN 'Completed Successfully'
ELSE 'Completed with Errors'
END
WHEN success_code = 100
AND exit_code = 10
THEN 'Completed with Errors'
ELSE CASE 
WHEN program_name <> 'FileDepCheck'
THEN 'Failed'
ELSE 'File not found'
END
END
ELSE CASE 
WHEN success_code = 10
AND exit_code = 0
THEN 'Failed; Entries for Input File Missing'
ELSE 'Aborted'
END
END
ELSE CASE 
WHEN ((Cast(Datediff(mi,start_datetime,Getdate()) AS INT) <= 240)
OR ((SELECT 
Count(* )
FROM 
MASTER.dbo.sysprocesses a(nolock)
INNER JOIN gcsdwdb.dbo.update_log b(nolock)
ON a.program_name = b.program_name
WHERE a.program_name = update_log.program_name
AND (Abs(Datediff(n,b.start_datetime,a.login_time))) < 1) > 0))
THEN 'Processing...'
ELSE 'Aborted without end date'
END
END
+ '</a>' AS VARCHAR) "Status / Log"
,Cast('<a href="' AS VARCHAR)
+ Replace(Replace(detail_log,'K:\','file://servernamehere/DiskVolK/'),
'\','/')
+ Cast('" title="Click to view Detail log text file"' AS VARCHAR(99))
+ Cast('style="font-family:comic sans ms; font-size:12; color:blue"><img src="images\DetailLog.bmp" border="0"></a>' AS VARCHAR(999))
+ Char(160)
+ Cast('<a href="' AS VARCHAR)
+ Replace(Replace(summary_log,'K:\','file://servernamehere/DiskVolK/'),
'\','/')
+ Cast('" title="Click to view Summary log text file"' AS VARCHAR(99))
+ Cast('style="font-family:comic sans ms; font-size:12; color:blue"><img src="images\SummaryLog.bmp" border="0"></a>' AS VARCHAR(999)) "Text Logs"
,errorcount "Error Count"
,warningcount "Warning Count"
,(totmsgcount
- errorcount
- warningcount) "Information Message Count"
,CASE 
WHEN end_datetime > start_datetime
THEN CASE 
WHEN Cast(Datepart("hh",(end_datetime
- start_datetime)) AS INT) > 0
THEN Cast(Datepart("hh",(end_datetime
- start_datetime)) AS VARCHAR)
+ ' hr '
ELSE ' '
END
+ CASE 
WHEN Cast(Datepart("mi",(end_datetime
- start_datetime)) AS INT) > 0
THEN Cast(Datepart("mi",(end_datetime
- start_datetime)) AS VARCHAR)
+ ' min '
ELSE ' '
END
+ CASE 
WHEN Cast(Datepart("ss",(end_datetime
- start_datetime)) AS INT) > 0
THEN Cast(Datepart("ss",(end_datetime
- start_datetime)) AS VARCHAR)
+ ' sec '
ELSE ' '
END
ELSE CASE 
WHEN end_datetime = start_datetime
THEN '< 1 sec'
ELSE CASE 
WHEN ((Cast(Datediff(mi,start_datetime,Getdate()) AS INT) <= 240)
OR ((SELECT 
Count(* )
FROM 
MASTER.dbo.sysprocesses a(nolock)
INNER JOIN gcsdwdb.dbo.update_log b(nolock)
ON a.program_name = b.program_name
WHERE a.program_name = update_log.program_name
AND (Abs(Datediff(n,b.start_datetime,a.login_time))) < 1) > 0))
THEN 'Running '
+ Cast(Datediff(mi,start_datetime,Getdate()) AS VARCHAR)
+ ' min'
ELSE '&nbsp;'
END
END
END "Elapsed Time" /* ,end_datetime "End Time (UTC/GMT)" ,datepart("hh" ,
(end_datetime - start_datetime)) "Hr" ,datepart("mi" ,(end_datetime - start_datetime)) "Mins" ,datepart("ss" ,(end_datetime - start_datetime)) "Sec" ,datepart("ms" ,(end_datetime - start_datetime)) "mSecs" ,datepart("dw" ,start_datetime) "dp" ,case when datepart("dw" ,start_datetime) = 6 then ' Fri' when datepart("dw" ,start_datetime) = 5 then ' Thu' else '1' end */
,totalrows "Total Rows"
,inserted "Rows Inserted"
,updated "Rows Updated" /* ,success_code "succ" ,exit_code "exit" */
FROM 
update_log
WHERE start_datetime >= '5/29/2009 16:15'
ORDER BY start_datetime DESC
+14  A: 

The answer is obviously "just retrieve output". Formatting on the SQL server has the following problems:

  • it increases the network traffic from the SQL server
  • SQL has very poor string handling functionality
  • SQL servers are not optimised to perform string manipulation
  • you are using server CPU cycles which could better be used for query processing
  • it may make life difficult (or impossible) for the query optimiser
  • you have to write many more queries to support different formatting
  • you may have to write different queries to support formatting on different browsers
  • you can't re-use queries for different purposes

I'm sure there are many more.

anon
+6  A: 

SQL should not be formatting, period. It's a relational algebra for extracting (when using SELECT) data from the database.

Getting the DBMS to format the data for you is the wrong thing to do, and that should be left to your own code (outside the DBMS). The DBMS is generally under enough load as it is without having to do your presentation work for you. It's also optimized for data retrieval, not presentation.

I know DBAs that would call for my immediate execution if I tried to do something like that :-)

paxdiablo
+4  A: 

The concept of formatting output in SQL does sort of break the whole concept of seperation of presentation and data, not only that, but there are a number of conditions that might arise:

  • What if you need to localise your date formats? UK uses a different date format to the US, for example - are you going into internationalize all the way back up to your data layer?

  • What if the rules of formatting change? I.e. Some text needs to be formatted in a different way to comply with some new corporate policy? Again, you would need to go all the way back to the data layer.

  • If we take a web context, how do you decide on escaping values? Different forms of escaping might be desired if you are outputting to a web page, or to JSON, or elsewhere...

Not only that, but SQL string manipulation functions are not typically very zippy.

Kazar
+1  A: 

I think there's a place for some kinds of transforms on the way out of SQL, and it depends on the calling program's expectations.

For instance, if a datetime is appropriate, it should be returned natively. On the other hand, if you are only returning a year in a datetime field (or a quarter, like 1/1, 4/1, 7/1, 10/1), and the client is expected to parse out the information, put it in a separate column (like year = 2008 or quarter = '2008Q1'). Some code translations from code to description (dropping the code column and only emitting the description). There are reasonable cases where concatenation and string building are appropriate.

Your particular example is a place where it's inappropriate and while on the surface it looks like looser coupling (only change the SP in the database) it can actually create stronger coupling by forcing additional SPs to be written for different usages instead of multiple UIs being able to use the same SP. And then multiple SPs might need to be changed in sync as the system evolves.

Cade Roux
A: 

Most people I know disagree with me here, but I kinda like this approach. So I'll list some advantages:

  • SQL is very powerful: how many lines of C# would this query take?
  • SQL is very easy to update. I imagine this code is in a stored procedure, which you can change with a simple ALTER PROC. This can greatly reduce the time to roll in fixes.
  • SQL is fast; I've seen cases where introducing an ORM layer slowed down the application to a crawl.
  • SQL is easy to debug, and errors are easy to reproduce. Just run the query. Testing your fix is a question of running the new query.
  • SQL like this is not that hard to maintain, when it's properly formatted. There is not much SQL I can't understand in 5-10 minutes; but a multi-layered C# solutions can take a very long time, especially if you have to figure out which layer's abstraction is breaking.

I'm sure other people will list the disadvantages of the SQL approach.

Andomar
+2  A: 

I'm the developer responsible for the reporting engine of my company's product. In simple terms the engine works by building an XML document of the data to go into a report from the database, and then transforming the XML any which way to build a web-page, or a PDF or a Word document based on user requirement.

When I started five years ago I had the database formatting the output, although I'm pleased to say nothing I wrote was as horrific as the questions example. Over time I've moved the other way and now the XML holds only the raw data, and this is tidied up during the presentation.

Our software uses Traffic Lights as a quick at-a-glance status indicator, so we have a lot of char fields in the database storing 'R', 'A', 'G', 'U' to represent red, amber, green and unknown. I had several tricks such as SELECTS with embedded CASE statements to tranform single character codes into their English counterparts:

SELECT CASE status WHEN 'R' THEN 'Red' WHEN 'G' THEN 'Green' ...etc...

Sorting can't be done on the native codes; Users expect things to be in two orders: Red, Amber, Green or Green, Amber, Red; so I had corresponding SORT columns as well

SELECT
    CASE status WHEN 'R' THEN 'Red' WHEN 'G' THEN 'Green' WHEN 'A' THEN 'Amber' END as status,
    CASE status WHEN 'R' THEN 0 WHEN 'A' THEN 1 WHEN 'G' THEN 2 END as sort
FROM
    table
ORDER BY
    sort

That's just a brief example. I had other tricks for doing date formatting, assembly of names, etc.

This of course led to problems making the application multi-language since English is boiled into the database. I'd need to lookup a customer locale and write lots of multi-language CASES to support other languages. Not good. Also dates were a problem. Americans like their dates mm/dd and Europeans do dd/mm.

It also led to other duplication problems. If someone added a fourth or fifth traffic light option I have to modify all my SQL when the new status is already represented in code as a Java enum or something, that I could lookup once I'd read the single character from the database.

It became far, far easier in my case to just have the database return the raw data and for me to write a suite of Comparators and formatters to present the data in a document in the user's native language and encoding. If I was starting over again today that'd be what I'd do.

banjollity
Alternatively, just store the mappings in the database, e.g.: create table Color (Status_Code, Status, Sort); select Status from Table join Color using (Status_Code) order by Sort;
cheduardo
A: 

When considering whether to format your data on behalf of your presentation layer, consider that your "presentation layer" may be a web service or other program. You may start by doing the formatting on behalf of a piece of UI code, only to later need the same query to be used by a web service, which will have different requirements.

A favorite of mine was a set of stored procedures which all formatted date/times. In the local timezone. It didn't work quite so well when called by a web service from a different timezone. It worked even less well when the regional settings of the database server changed, changing the date/time format. Oh, and it didn't work at midnight, since it truncated the "00:00" at the end.

OTOH, it was very convenient for the UI.

John Saunders