Here is a simple URL encoding function (it uses varchar as parameter) I found long time ago on some forum
create function urlencode(@str as varchar(4000))
returns varchar(4000)
as
begin
declare @hex char(16)
declare @c char(1)
set @hex='0123456789ABCDEF'
declare @ostr varchar(4000)
set @ostr=''
declare @l int
set @l = 1
while @l <= len(@str)
begin
set @c = substring(@str,@l,1)
if @c between '0' and '9'
or @c between 'A' and 'Z'
or @c between 'a' and 'z'
set @ostr = @ostr + @c
else
set @ostr = @ostr + '%' +
substring(@hex,(ascii(@c)/16)+1,1)
+substring(@hex,(ascii(@c)&15)+1,1)
set @l=@l+1
end
return @ostr
end
go
How will you handle unicode? Well, it's quite straightforward if you don't care about Hindu or Arabic symbols but do care about Central European languages. Just what you need is to use CAST(@nvarchar as varchar) function.
Lets check how this work with some Central European symbols. Run the following example in
declare @t1 nvarchar(256)
select @t1 = N'áâãäåæçèéêëìíîïðñòóôõöùúûüýÿāăąćĉċčĕėęěĝğġģĥħĩīĭįĵķļľńňŋōŏőŕřśŝşšťũūŭůűŵŷźžǻǽǿ'
select @t1
declare @t2 varchar(512)
select @t2 = cast(@t1 as varchar(512))
select @t2
And see what output we will get
áâãäåæçèéêëìíîïðñòóôõöùúûüýÿāăąćĉċčĕėęěĝğġģĥħĩīĭįĵķļľńňŋōŏőŕřśŝşšťũūŭůűŵŷźžǻǽǿ
aaaaa?ceeeeiiii?nooooouuuuyyaaacccceeeegggghhiiiijkllnn?ooorrsssstuuuuuwyzz???
So, most symbols converted perfectly, while several symbols will be question marks. If you care about such symbols (such as æ, ð, ŋ) you need to write an additional function that will replace them before conversion to something that you will find most appropriate for them (sometimes 2 symbols instead of one, for example æ => ae).
To replace you can use REPLACE() function, but you should understand that if you call it too many times, the performance will suffer. So if you have lot of character replacements, you can use the following algorithm
1) Create a temporary table (or table type variable) with 3 columns - position int identity(0,1) primary key clustered, original nchar(1) not null, converted varchar(2) null
2) Using loop and SUBSTRING() function split string into characters and insert each char to original column of this temporary table
3) Using one query with many WHEN THEN statements convert all symbols
update @temp_table
set converted = CASE original
WHEN N'æ' THEN 'ae'
WHEN N'ŋ' THEN 'n'
... and so on ...
ELSE CAST(original AS VARCHAR(2))
4) Using loop, concatenate results that you have in converted column into one varchar() variable.
When you converted nvarchar() to varchar(), call the urlencode() function I listed above.
I understand that this case will require a lot of WHEN/THEN, but it depends on what langauges you have currently. As you see, for most European symbols CAST to varchar gives perfect result.
If you will go with CLR function implementation (on C#), you will have to write a lot of switch/case statements too. So comparing both approaches, both will require same development efforts, but CLR solution will require additional administrative actions. For small strings CLR solution will work slowly (because SQL server requires some time to interop with CLR environment to do the call and then get the results back) while for big strings with lots of replacements C# maybe (never checked this!) could be faster because SQL is not the best language for string manipulations.