views:

427

answers:

2

I need to do some in-memory merging in C# of two sorted streams of strings coming from one or more SQL Server 2000 databases into a single sorted stream. These streams of data can be huge, so I don't want to pull both streams into memory. Instead, I need to keep one item at a time from each stream in memory and at each step, compare the current item from each stream, push the minimum onto the final stream, and pull the next item from the appropriate source stream. To do this correctly, though, the in-memory comparison has to match the collation of the database (consider the streams [A,B,C] and [A,B,C]: the correct merged sequence is [A,A,B,B,C,C], but if your in-memory comparison thinks C < B, your in-memory merge will yield A,A,B, at which point it will be looking at a B and a C, and will yield the C, resulting in an incorrectly sorted stream.)

So, my question is: is there any way to mimic any of the collations in SQL Server 2000 with a System.StringComparison enum in C# or vise-versa? The closest I've come is to use System.StringCompaison.Ordinal with the results of the database strings converted to VARBINARY with the standard VARBINARY ordering, which works, but I'd rather just add an "order by name collate X" clause to my SQL queries, where X is some collation that works exactly like the VARBINARY ordering, rather than converting all strings to VARBINARY as they leave the database and then back to strings as they come in memory.

A: 

Have a look at the StringComparer class. This provides for more robust character and string comparisons than you'll find with String.Compare. There are three sets of static instances (CurrentCulture, InvariantCulture, Ordinal) and case-insesitive versions of each. For more specialized cultures, you can use the StringComparer.Create() function to create a comparer tied to a particular culture.

Adam Robinson
A: 

With sql 2005 I know that the db engine does not make OS calls to do the sorting, the ordering rules are statically shipped with the db (may update with a service pack, but doesn't change with the OS). So I don't think you can safely say that a given set of application code can order the same way unless you have the same code as the db server, unless you use a binary collation.

But if you use a binary collation in the db and client code you should have no problem at all.

EDIT - any collation that ends in _BIN will give you binary sorting. The rest of the collation name will determine what code page is used for storing CHAR data, but will not affect the ordering. The _BIN means strictly binary sorting. See http://msdn.microsoft.com/en-us/library/ms143515(SQL.90).aspx

Are you sure _BIN applies for SQL Server 2000?
Matthew Flaschen
I've tried several of the BIN orderings, including Latin1_General_BIN and Latin1_General_BIN2, but they aren't the same as StringComparison.Ordinal. My test cases include generating all byte strings up to a certain length, converting them into strings with System.Text.Encoding.Unicode.GetString, sorting them with StringComparison.Ordinal, and comparing the results with what comes out of the database using "order by collate X". None of the _BIN or _BIN2 orderings have matched the in-memory sorted order yet.
Larry Sellers
@larry - how are you generating the strings in the db? Just like cast(0x..... as nvarchar(N)) ?
I'm inserting the strings into a table from C# using System.Data.SqlClient, straight from the sorted list I've generated and tested against StringComparison.Ordinal.
Larry Sellers
@Larry - can you confirm you are using Nchar/Nvarchar and not char/varchar? Probably you should be using StringComparison.OrdinalIgnoreCase?
I've tried both VARCHAR and NVARCHAR, but I can't get a collation that matches the VARBINARY ordering with either, and none of the StringComparison enums seem to match basic collations like Latin1_General_BIN
Larry Sellers