views:

265

answers:

4

I'm working on a query with a varchar column called ALCOHOL_OZ_PER_WK. Part of the query includes:

where e.ALCOHOL_OZ_PER_WK >= 14

and get the errors:

Arithmetic overflow error converting varchar to data type numeric.

and: Error converting data type varchar to numeric.

Looking into the values actually stored in the column, the largest look close to 100, but some of the entries are ranges:

9 - 12
1.5 - 2.5

I'd like to get the upper limit (or maybe the midpoint of the range) from rows with entries like this and have it be the value being compared to 14.

What would be the (or an) easy way to do this?

As always, thank you!

A: 

As you say you need to calculate numeric values, which you can then use in your query.

Probably the easiest way is to use some simple logic to calculate the average or upper limit using string functions, and string to numeric functions.

If all you want is the upper limit, just get the characters after the '-' and use that.

Bravax
+1  A: 

Not sure about easy ways.

A proper way is to store the numbers in two columns, ALCOHOL_OZ_PER_WK_MIN and ALCOHOL_OZ_PER_WK_MAX.

GSerg
A: 

"probably because some are ranges" - do you get that "range" is not a SQL Server Data type? You've got non-numeric data you're trying to convert into numeric data, and you've got a scalar value you're comparing to a non-scalar value.

This database has some issues.

John Saunders
Sorry, that was an attempt at some humor. As Damir Sudarevic says, this is the raw data. Most of the data in our database has been reformatted and optimized by the DBAs, but this particular column has not been addressed. If I wait for everything to be nicely formatted, I'll never graduate :-)
raoulcousins
+2  A: 

Your DB is obviously result of some survey, and it seems to contain the original survey data. The usual way is to run this through an ECCD (Extract, Clean, Conform, Deliver) process and store clean and standardized data into a separate database (maybe a warehouse) which can then be used for analytics and reporting.

If you have SSIS use data profiling task to get an idea of types of strings you have in there. The Column Pattern Profile reports a set of regular expressions on the string column, so you will get an idea of what's inside those strings. If you do not have SSIS, you can use eobjects DataCleaner to do the same.

If you can not spare a new database or at least a new table -- at minimum add a numeric column to this table and then extract numeric values form those strings into the new column. You may want to use "something else" (SSIS, Pentaho Kettle, Python, VB, C#) to do this -- in general T-SQL in not very good at string processing.

My guess is that this is not the only column that has garbage inside, so any analysis that you may run on this may be worthless.

And if you still think that the ranges are the only problem, this example may help:

First some data

DECLARE @myTable TABLE ( 
  AlUnits varchar(10)
  ) ;

INSERT  INTO @myTable
        (AlUnits )
VALUES  ( '10' )
,       ( '15' )
,       ( '20' )
,       ( '7 - 12' )
,       ( '3 - 5' )
;

The query splits records into two groups, numeric and not numeric -- assumed ranges.

;
WITH  is_num
        AS ( SELECT CAST(AlUnits AS decimal(6, 2)) AS Units_LO
                   ,CAST(AlUnits AS decimal(6, 2)) AS Units_HI
             FROM   @myTable
             WHERE  ISNUMERIC(AlUnits) = 1
           ),
      is_not_num
        AS ( SELECT CAST( RTRIM(LTRIM(LEFT(AlUnits,
                          CHARINDEX('-', AlUnits) - 1)))
                        AS decimal(6,2)) AS Units_LO
                   ,CAST(RTRIM(LTRIM(RIGHT(AlUnits,
                               LEN(AlUnits)
                               - CHARINDEX('-', AlUnits))))
                        AS decimal(6,2)) AS Units_HI
             FROM   @myTable
             WHERE  ISNUMERIC(AlUnits) = 0
           )
  SELECT  Units_LO
         ,Units_HI
         ,CAST(( Units_LO + Units_HI ) / 2.0 AS decimal(6, 2)) AS Units_Avg
  FROM    is_num
  UNION ALL
  SELECT  Units_LO
         ,Units_HI
         ,CAST(( Units_LO + Units_HI ) / 2.0 AS decimal(6, 2)) AS Units_Avg
  FROM    is_not_num ;

Returns:

Units_LO    Units_HI    Units_Avg 
----------- ----------- ----------
10.00       10.00       10.00     
15.00       15.00       15.00     
20.00       20.00       20.00     
7.00        12.00       9.50      
3.00        5.00        4.00      
Damir Sudarevic
Thanks! I ran DataCleaner on the column. From this it looks to me that the data always is either NULL, an integer or decimal or a x - y (with x and y integers or decimals) but I'm getting the error `Msg 537, Level 16, State 2, Line 1 Invalid length parameter passed to the LEFT or SUBSTRING function.`, so the data might be still be messier than I think. Luckily, at the moment, this is the only column that has garbage, everything else is in nicely formatted in a data mart.
raoulcousins