views:

212

answers:

7

hey guys i have to read a huge xml file which consists of over 3 million records and over 10 million nested elements

naturally i am using xmltextreader and have got my parsing time down to about 40 seconds from earlier 90 seconds using multiple optimization tricks and tips

but i want to further save processing time as much as i can hence below question

quite a few elements are of type xs:boolean and the data provider always represents values as "true" or "false" - never "1" or "0"

for such cases my earliest code was:

if (xmlTextReader.Value == "true")
{
    bool subtitled = true;
}

which i further optimized to:

if (string.Equals(xmlTextReader.Value, "true", StringComparison.OrdinalIgnoreCase))
{
    bool subtitled = true;
}

i wanted to know if below would be fastest (because its either "true" or "false")?

if (xtr.value.length == 4)
{
    bool subtitled = true;
}
+6  A: 

Yes, it's faster, because you only compare exactly one value, namely the length of the string.

By comparing two strings with each other, you compare each and every character, as long as both characters are the same. So if you're finding a match for the string "true", you're going to do 4 comparisons before the predicate evaluates to true.

The only problem you have with this solution is, that if someday the value is going to change from true to let's say 1, you're going to run into a problem here.

Giu
It's actually not a problem: you'd have the same failing comparisons when "1" != "true" and "0" != "false". You can't change one half interface implementation and expect the interface and all other implementations of that interface to magically change. See also Postel's Law: Be conservative in what you send; be liberal in what you accept.
MSalters
@Msalters but the length of "1" and the length of "0" is the same so you can't determine the value based on the length.
Rune FS
But is it faster than xmlTextReader.Value[0] == 't'? Just wanted to raise the question of cause the right thing to do is to benchmark
Rune FS
@RuneFS: that's besides the point. If the interface says to use "true" and "false", then it works. If the interface is changed to ujse "0" and "1", then String.Equal will work. But if the interface would be changed to use only "false", with true being the implied default if there is no element in your XML, then your string comparison breaks. Ergo, you can't speculate whether your parsing algorithm will understand a future protocol version, and you must consider that incompatible by default.
MSalters
+4  A: 

Comparing length will be faster, but less readable. I wouldn't use it unless I profile the performance of the code and conclude that I need this optimization.

Alex Reitbort
A: 

Cant you just write a unit test? Run each scenario for example 1000 times and compare the datetimes.

femseks
+1  A: 

Measuring the length would almost invariably be faster. That said, unless this is an experiment in micro-optimization, I'd just focus on making the code to be readable and convey the proper semantics.

You might also try something like that uses the following approach:

Boolean.TryParse(xmlTextReader.Value, out subtitled)

I know that has nothing to do with your question, but I figured I'd throw it out there anyway.

Patrick
+3  A: 

What about comparing the first character to "t"?

Should (maybe :) be faster than comparing the whole string..

Øyvind Skaar
A: 

String comparing and parsing is very slow in .Net, I'd recommend avoid intensive using string parsing/comparing in .Net.

If you're forced to do it -- use highly optimized unmanaged or unsafe code and use parallelism.

IMHO.

Dmitry Karpezo
Any links to support this claim?
FuleSnabel
-1: Out of context, or especially in this context, this statement is already questionable. But even with the benefit of doubt, sorry, but without any evidence/facts/experience whatsoever, this is nothing but FUD.
Christian.K
Just write simple tests and see it yourself. I did it.Write, for instance, string comparsion, char access, for example atoi() and Convert.ToInt32() in c/c++ and c# and you will see that the native unmanaged code hundreds time more efficient.
Dmitry Karpezo
Using out of context micro benchmarks - maybe yes. However, you give a pretty bold recommendation in your answer to use unsafe or unmanaged code. Which, as such, is not necessarly bad, but should only - if even possible (think portability, mobile, silverlight, medium-trust environments, etc.) - used after really figuring out if *that* particular part of the process is really the bottleneck, compared to the other heavylifting that goes one. Regarding the original question I would assume the XML parsing to have the lion's share. Besides the question was about string-to-string-comparison anyway.
Christian.K
Completely agree with you, optimization is not necessarily low-level tuning and hacks such as using unsafe code instead BCF routines. My point was the .NET string performance itself. Sorry for misunderstanding.
Dmitry Karpezo
A: 

If you know it's either "true" or "false", the last snippet must be fastest.

Anyway, you can also write:

bool subtitled = (xtr.Value.length == 4);

That should be even faster.

tia