Here is the situation:
I am making a small prog to parse server log files.
I tested it with a log file with several thousand requests (between 10000 - 20000 don't know exactly)
What i have to do is to load the log text files into memory so that i can query them.
This is taking the most resources.
The methods that take the most cpu time are those (worst culprits first):
string.split - splits the line values into a array of values
string.contains - checking if the user agent contains a specific agent string. (determine browser ID)
string.tolower - various purposes
streamreader.readline - to read the log file line by line.
string.startswith - determine if line is a column definition line or a line with values
there were some others that i was able to replace. For example the dictionary getter was taking lots of resources too. Which i had not expected since its a dictionary and should have its keys indexed. I replaced it with a multidimensional array and saved some cpu time.
Now i am running on a fast dual core and the total time it takes to load the file i mentioned is about 1 sec.
Now this is really bad.
Imagine a site that has tens of thousands of visits a day. It's going to take minutes to load the log file.
So what are my alternatives? If any, cause i think this is just a .net limitation and i can't do much about it.
EDIT:
If some of you gurus want to look at the code and find the problem here are my code files:
- http://freehosting1.net/temp/data.txt
- http://freehosting1.net/temp/logentry.txt
- http://freehosting1.net/temp/lists.txt
The function that takes the most resources is by far LogEntry.New The function that loads all the data is called Data.Load
Total amount of LogEntry objects created: 50 000. Time taken: 0.9 - 1.0 seconds.
CPU: amd phenom II x2 545 3ghz.
not multithreaded