views:

366

answers:

9

I'm trying to speed up the following:

string s; //--> s is never null

if (s.Length != 0)
{
   <do something>
}

Problem is, it appears the .Length actually counts the characters in the string, and this is way more work than I need. Anybody have an idea on how to speed this up?

Or, is there a way to determine if s[0] exists, w/out checking the rest of the string?

+4  A: 

Accessing the Length property shouldn't do a count -- .NET strings store a count inside the object.

The SSCLI/Rotor source code contains an interesting comment which suggests that String.Length is (a) efficient and (b) magic:

// Gets the length of this string
//
/// This is a EE implemented function so that the JIT can recognise is specially
/// and eliminate checks on character fetchs in a loop like:
/// for(int I = 0; I < str.Length; i++) str[i]
/// The actually code generated for this will be one instruction and will be inlined.
//
public extern int Length {
    [MethodImplAttribute(MethodImplOptions.InternalCall)]
    get;
}
Tim Robinson
+6  A: 

String.IsNullOrEmpty is the preferred method for checking for null or zero length strings.

Internally, it will use Length. The Length property for a string should not be calculated on the fly though.

If you're absolutely certain that the string will never be null and you have some strong objection to String.IsNullOrEmpty, the most efficient code I can think of would be:

if(s.Length > 0)
{
    // Do Something
}

Or, possibly even better:

if(s != "")
{
    // Do Something
}
Justin Niessner
Fair enough, but it's overkill if the string is known not to be null.
Tim Robinson
Not only is it overkill, but it suggests you think the variable *might* be null.
Jon Skeet
then compare it with `String.Empty`
Ankit Jain
In that case `string.Length == 0` is the fastest way to determine if the string is empty.
Matt Greer
+1  A: 

Here is the function String.IsNullOrEmpty -

if (!String.IsNullOrEmpty(yourstring))
{
  // your code
}
Ankit Jain
Please explain.
bowenl2
@bowenl, why did you vote down?
Ankit Jain
Because you had initially posted something different which didn't compile at all. I retracted it now that you've fixed it to match what someone else posted before you.
bowenl2
Specifically this: if (![String.IsNullOrEmpty][1](yourstring))
bowenl2
that I was trying to put hyperlink.. but then realized that code block cannot. I should have seen preview. btw, thanks for your efforts.
Ankit Jain
+14  A: 

EDIT: Now that you've provided some more context:

  • Trying to reproduce this, I failed to find a bottleneck in string.Length at all. The only way of making it faster was to comment out both the test and the body of the if block - which isn't really fair. Just commenting out the condition slowed things down, i.e. unconditionally copying the reference was slower than checking the condition.

  • As has been pointed out, using the overload of string.Split which removes empty entries for you is the real killer optimization.

  • You can go further, by avoiding creating a new char array with just a space in every time. You're always going to pass the same thing effectively, so why not take advantage of that?

  • Empty arrays are effectively immutable. You can optimize the null/empty case by always returning the same thing.

The optimized code becomes:

private static readonly char[] Delimiters = " ".ToCharArray();
private static readonly string[] EmptyArray = new string[0];

public static string[] SplitOnMultiSpaces(string text)
{
    if (string.IsNullOrEmpty(text))
    {
        return EmptyArray;
    }

    return text.Split(Delimiters, StringSplitOptions.RemoveEmptyEntries);
}

String.Length absolutely does not count the letters in the string. The value is stored as a field - although I seem to remember that the top bit of that field is used to remember whether or not all characters are ASCII (or used to be, anyway) to enable other optimisations. So the property access may need to do a bitmask, but it'll still be O(1) and I'd expect the JIT to inline it, too. (It's implemented as an extern, but hopefully that wouldn't affect the JIT in this case - I suspect it's a common enough operation to potentially have special support.)

If you already know that the string isn't null, then your existing test of

if (s.Length != 0)

is the best way to go if you're looking for raw performance IMO. Personally in most cases I'd write:

if (s != "")

to make it clearer that we're not so much interested in the length as a value as whether or not this is the empty string. That will be slightly slower than the length test, but I believe it's clearer. As ever, I'd go for the clearest code until you have benchmark/profiling data to indicate that this really is a bottleneck. I know your question is explicitly about finding the most efficient test, but I thought I'd mention this anyway. Do you have evidence that this is a bottleneck?

EDIT: Just to give clearer reasons for my suggestion of not using string.IsNullOrEmpty: a call to that method suggests to me that the caller is explicitly trying to deal with the case where the variable is null, otherwise they wouldn't have mentioned it. If at this point of the code it counts as a bug if the variable is null, then you shouldn't be trying to handle it as a normal case.

In this situation, the Length check is actually better in one way than the inequality test I've suggested: it acts as an implicit assertion that the variable isn't null. If you have a bug and it is null, the test will throw an exception and the bug will be detected early. If you use the equality test it will treat null as being different to the empty string, so it will go into your "if" statement's body. If you use string.IsNullOrEmpty it will treat null as being the same as empty, so it won't go into the block.

Jon Skeet
I think the best thing about this is the last line.
msarchet
@msarchet: It's unfortunate that I edited the answer while you were writing that comment. I assume you mean "Do you have evidence that this *is* a bottleneck?"
Jon Skeet
@Jon Skeet, yes, though it kind of still applies
msarchet
A: 

As always with performace: benchmark.
Using C# 3.5 or before, you'll want to test yourString.Length vs String.IsNullOrEmpty(yourString)

using C# 4, do both of the above and add String.IsNullOrWhiteSpace(yourString)

Of course, if you know your string will never be empty, you could just attempt to access s[0] and handle the exception when it's not there. That's not normally good practice, but it may be closer to what you need (if s should always have a non-blank value).

AllenG
`IsNullOrWhitespace` will give the wrong answer though, given the requirements in the question.
Jon Skeet
A: 

Thanks for all the suggestions, but I still need some clarifications, if possible.

I'm using C# 4, and I've run a profiler against my code, and the biggest bottleneck is indeed the string.Length check. Now I know VB stored the length in a field, but I wasn't sure if C# is following more along the lines of C and the null-terminated string. If VB style, then yes, it should be able to return it instantly, but if C/null-terminated, then it will have to count.

Anyway, here's my function:

public string[] SplitOnMultiSpaces(this string text)
{
  if (string.IsNullOrEmpty(text)) return new string[0];

  var split = text.Split(' ');
  int length = split.Length;

  var data = new string[length];

  int index = 0;
  for (int i = 0; i<length; i++)
  {
    if (split[i].Length != 0)
    {
      data[index++] = split[i];
    }
  }

  return data;
}

Here, you can see why the string.Length I'm checking can never be null.

If I profile this, against 100,000 strings, it's taking 1.042 seconds to execute. If I comment out the string.Length IF statement, it takes 0.2 seconds to execute.

Now I may be wrong, but why else would string.Length increase execution time by 5-fold if it's not trying to count the string length?

(My real program is executing this function 100million+ times) P.S. -- What'd I really like is the most efficient algorithm for the following:

Turn the string:

"A  B       C  D   E  "

into array:

string["A","B","C","D","E"]
Eric
You should post this as an edit to your original question - people won't be realising your "answer" is in fact further question. Also, make sure your code is complete - it looks like you're missing the very line you're talking about. Finally: the variable `split` in what code you've posted should contain exactly the array you are looking for - why didn't you ask the question you meant in the first place?
Dan Puzey
@Eric: I'll try to reproduce your results...
Jon Skeet
Check out my answer now that you've posted what you're actually trying to do here...
drharris
@Eric @drharris: And mine, which is optimized one smidge further :) (EDIT: I say a smidge, but on my machine it saves about 8% of the running time, with the sample string given.)
Jon Skeet
A: 
        for (int i = 0; i < 100; i++)
        {
            System.Diagnostics.Stopwatch timer = new System.Diagnostics.Stopwatch();
            string s = "dsfasdfsdafasd";

            timer.Start();
            if (s.Length > 0)
            {
            }

            timer.Stop();
            System.Diagnostics.Debug.Write(String.Format("s.Length != 0 {0} ticks       ", timer.ElapsedTicks));

            timer.Reset();
            timer.Start();
            if (s == String.Empty)
            {
            }

            timer.Stop();
            System.Diagnostics.Debug.WriteLine(String.Format("s== String.Empty {0} ticks", timer.ElapsedTicks));
        }

Using the stopwatch the s.length != 0 takes less ticks then s == String.Empty

after I fix the code

Mike
+1  A: 

Based on your intent described in your answer, why don't you just try using this built-in option on Split:

s.Split(new[]{" "}, StringSplitOptions.RemoveEmptyEntries);
drharris
A: 

Just use String.Split(new char[]{' '}, StringSplitOptions.RemoveEmptyEntries) and it will do it all for you.

Star