I've learnt that it's usually a "smell" when things fail near a power of two ...
Given
Over the last few weeks I've been subject to a sudden and significant performance deterioration
and
AddExistingFile is called 66,914 times
I'm wondering if the poor performance hit at about the time as the number of files exceeded 65,535 ...
Other possibilities to consider ...
Are all 66,914 files in the same directory? If so, that's a lot of directory blocks to access ... try a hard drive defrag. In fact, it's even more directory blocks if they're distributed across a bunch of directories.
Are you storing all the files in the same list? Are you preseting the capacity of that list, or allowing it to "grow" naturally and slowly?
Are you scanning for files depth first or breadth first? Caching by the OS will favor the performance of depth first.
Update 14/7
Clarification of Are you storing all the files in the same list?
Naive code like this first example doesn't perform ideally well because it needs to reallocate storage space as the list grows.
var myList = new List<int>();
for (int i=0; i<10000; i++)
{
myList.Add(i);
}
It's more efficient, if you know it, to initialize the list with a specific capacity to avoid the reallocation overhead:
var myList = new List<int>(10000); // Capacity is 10000
for (int i=0; i<10000; i++)
{
myList.Add(i);
}
Update 15/7
Comment by OP:
These web apps are not programmatically probing files on my hard disk, at least not by my hand. If there is any recursive file scanning, its by VS 2008.
It's not Visual Studio that's doing the file scanning - it is your web application. This can clearly be seen in the first profiler trace you posted - the call to System.Web.Hosting.HostingEnvironment.Initialize()
is taking 49 seconds, largely because of 66,914 calls to AddExistingFile()
. In particular, the read of the property CreationTimeUTC
is taking almost all the time.
This scanning won't be random - it's either the result of your configuration of the application, or the files are in your web applications file tree. Find those files and you'll know the reason for your performance problems.