views:

29

answers:

1

Good Day Sir/Madame! ^_^

I'm quite a novice when it comes to sql, so please forgive my ignorance.

I have quite a large nvarchar which I wish to pass to the HashBytes function. I get the error:

"String or binary would be truncated. Cannot insert the value NULL into column 'colname', tbale 'table'; column does not allow nulls. UPDATE fails. The statement has been terminated."

Being ever resourceful, I discovered this was due to the HashBytes function having a maximum limit of 8000 bytes. Further searching showed me a 'solution' where my large varchar would be divided and hashed seperately and then later combined with this user defined function:

function [dbo].[udfLargeHashTable] (@algorithm nvarchar(4), @InputDataString varchar(MAX))
RETURNS varbinary(MAX)
AS
BEGIN
DECLARE
    @Index int,
    @InputDataLength int,
    @ReturnSum varbinary(max),
    @InputData varbinary(max)

SET @ReturnSum = 0
SET @Index = 1
SET @InputData = convert(binary,@InputDataString)
SET @InputDataLength = DATALENGTH(@InputData)

WHILE @Index <= @InputDataLength
BEGIN
    SET @ReturnSum = @ReturnSum + HASHBYTES(@algorithm, SUBSTRING(@InputData, @Index, 8000))
    SET @Index = @Index + 8000
END
RETURN @ReturnSum
END

which I call with:

set @ReportDefinitionHash=convert(int,dbo.[udfLargeHashTable]('SHA1',@ReportDefinitionForLookup))

Where @ReportDefinitionHash is int, and @ReportDefinitionForLookup is the varchar

Passing a simple char like 'test' produces a different int with my UDF than a normal call to HashBytes would produce.

Any advice on this issue?

Regards, Byron Cobb.

+1  A: 

Just use this function (taken from Hashing large data strings with a User Defined Function):

create function dbo.fn_hashbytesMAX
     (@string varchar(max)
     ,@Algo    varchar(10)
    )
    returns binary(20)
as
/************************************************************
*
*    Author:        Brandon Galderisi
*    Last modified: 15-SEP-2009 (by Denis)
*    Purpose:       uses the system function hashbytes as well
*                   as sys.fn_varbintohexstr to split an 
*                   nvarchar(max) string and hash in 8000 byte 
*                   chunks hashing each 8000 byte chunk,,
*                   getting the 40 byte output, streaming each 
*                   40 byte output into a string then hashing 
*                   that string.
*
*************************************************************/
begin
     declare    @concat       varchar(max)
               ,@NumHash      int
               ,@HASH         binary(20)
     set @NumHash = ceiling((datalength(@string)/2)/(4000.0))
    /* HashBytes only supports 8000 bytes so split the string if it is larger */
    if @NumHash>1
    begin
                                                        -- # * 4000 character strings
          ;with a as (select 1 as n union all select 1) -- 2 
               ,b as (select 1 as n from a ,a a1)       -- 4
               ,c as (select 1 as n from b ,b b1)       -- 16
               ,d as (select 1 as n from c ,c c1)       -- 256
               ,e as (select 1 as n from d ,d d1)       -- 65,536
               ,f as (select 1 as n from e ,e e1)       -- 4,294,967,296 = 17+ TRILLION characters
               ,factored as (select row_number() over (order by n) rn from f)
               ,factors as (select rn,(rn*4000)+1 factor from factored)

          select @concat = cast((
          select right(sys.fn_varbintohexstr
                         (
                         hashbytes(@Algo, substring(@string, factor - 4000, 4000))
                         )
                      , 40) + ''
          from Factors
          where rn <= @NumHash
          for xml path('')
          ) as nvarchar(max))


          set @HASH = dbo.fn_hashbytesMAX(@concat ,@Algo)
    end
     else
     begin
          set @HASH = convert(binary(20), hashbytes(@Algo, @string))
     end

return @HASH
end

And the results are as following:

select 
 hashbytes('sha1', N'test') --native function with nvarchar input
,hashbytes('sha1', 'test') --native function with varchar input 
,dbo.fn_hashbytesMAX('test', 'sha1') --Galderisi's function with assumes varchar input
,dbo.fnGetHash('sha1', 'test') --your function

Output:

0x87F8ED9157125FFC4DA9E06A7B8011AD80A53FE1  
0xA94A8FE5CCB19BA61C4C0873D391E987982FBBD3  
0xA94A8FE5CCB19BA61C4C0873D391E987982FBBD3   
0x00000000AE6DBA4E0F767D06A97038B0C24ED720662ED9F1
Denis Valeev