views:

120

answers:

3

I noticed this particular line of code when I was profiling my application (which creates a boatload of database insertions by processing some raw data):

myStringBuilder.AppendLine(
    string.Join(
        BULK_SEPARATOR, new string[]{myGuid.ToString() ...

Bearing in mind that the resultant string is going to end up in a file called via the TSQL command BULK INSERT, is there a way I can do this step faster? I know that getting a byte array is faster, but I can't just plug that into the file.

A: 

You aren't indicating where you are getting the guid from. Also, I don't believe that getting the bytes are going to be any faster, as you are going to do what the ToString method on the Guid class already does, iterate through the bytes and convert to a string value.

Rather, I think a few general areas that this code can possibly be improved upon in terms of performance are (and assuming you are doing this in a loop):

  • Are you reusing the myStringBuilder instance upon a new iteration of your loop? You should be setting the Length (not the Capacity) property to 0, and then rebuild your string using that. This will prevent having to warm up a new StringBuilder instance, and the memory allocation for a larger string will already have been made.

  • Use calls to Append on myStringBuilder instead of calling String.Join. String.Join is going to preallocate a bunch of memory and then return a string instance which you will just allocate again (if on the first iteration) or copy into already-allocated space. There is no reason to do this twice. Instead, iterate through the array you are creating (or expand the loop, it seems you have a fixed-size array) and call Append, passing in the guid and then then BULK_SEPARATOR. It's easier to remove the single character from the end, btw, just decrement the Length property of the StringBuilder by one if you actually appended Guids.

casperOne
casperOne: My point is that getting bytes is faster but useless since it isn't a string. However, SQL server does know how to interpret the bytes...just not when given a bulk insert file. As for your performance notes, they are not relevant as that code, though likely inefficient, is so fast compared to Guid.ToString() that it doesn't matter. And string.join seem clearer to me, so I'll stick with that unless performance dictates otherwise. Also, the stringbuilder is a rather large string that is created once and ends up storing a moderately long string.
Brian
+2  A: 

The fastest and simplest way would be not to use BULK INSERT with a raw file at all. Instead, use the SqlBulkCopy class. That should speed this up significantly by sending the data directly over the pipe instead of using an intermediate file.

(You'll also be able to use the Guid directly without any string conversions, although I can't be 100% sure what SqlBulkCopy does internally with it.)

Aaronaught
Actually, I've heard that `bulk insert` is much faster.
Brian
@Brian: Where have you heard that? It's patently false. `SqlBulkCopy` **is** a bulk insert, it just goes directly over the SQL pipe, so there's only one data transfer instead of two.
Aaronaught
Now that I think of it, I think I'm actually paying this price twice. I've been ignoring the fact that during the bulk insert, it has to parse the Guid (and the rest of the string, of course). So I'm paying for this twice.
Brian
That was my thought as well. Not only the cost of parsing but also the I/O cost of writing to a file and then reading from the same file. Let us know how it turns out...
Aaronaught
Tip: If you care about performance, `SqlBulkCopyOptions.UseInternalTransaction` is likely to be your friend. Anyhow, simple local testing suggests that performance is, at the very least, not suffering...which is sufficient justification to do this rather than the comparatively hacky File I/O + `Bulk Insert` method.
Brian
A: 

If time is critical - could you pre-create a sufficiently long list of GUID's converted to string ahead of time, and then use it in your code? Either in C#, or possibly in SQL Server, depending on your requirements?

marc_s
That isn't really practical, in my case.
Brian