I need to calculate how many times each keyword is reoccurring in a string, with sorting by highest number. What's the fastest algorithm available in .NET code for this purpose?
You could break the string into a collection of strings, one for each word, and then do a LINQ query on the collection. While I doubt it would be the fastest, it would probably be faster than regex.
Dunno about fastest, but Linq is probably the most understandable:
var myListOfKeywords = new [] {"struct", "public", ...};
var keywordCount = from keyword in myProgramText.Split(new []{" ","(", ...})
group by keyword into g
where myListOfKeywords.Contains(g.Key)
select new {g.Key, g.Count()}
foreach(var element in keywordCount)
Console.WriteLine(String.Format("Keyword: {0}, Count: {1}", element.Key, element.Count));
You can write this in a non-Linq-y way, but the basic premise is the same; split the string up into words, and count the occurrences of each word of interest.
Simple algorithm: Split the string into an array of words, iterate over this array, and store the count of each word in a hash table. Sort by count when done.
EDIT: code below groups unique tokens with count
string[] target = src.Split(new char[] { ' ' });
var results = target.GroupBy(t => new
{
str = t,
count = target.Count(sub => sub.Equals(t))
});
This is finally starting to make more sense to me...
EDIT: code below results in count correlated with target substring:
string src = "for each character in the string, take the rest of the " +
"string starting from that character " +
"as a substring; count it if it starts with the target string";
string[] target = {"string", "the", "in"};
var results = target.Select((t, index) => new {str = t,
count = src.Select((c, i) => src.Substring(i)).
Count(sub => sub.StartsWith(t))});
Results is now:
+ [0] { str = "string", count = 4 } <Anonymous Type>
+ [1] { str = "the", count = 4 } <Anonymous Type>
+ [2] { str = "in", count = 6 } <Anonymous Type>
Original code below:
string src = "for each character in the string, take the rest of the " +
"string starting from that character " +
"as a substring; count it if it starts with the target string";
string[] target = {"string", "the", "in"};
var results = target.Select(t => src.Select((c, i) => src.Substring(i)).
Count(sub => sub.StartsWith(t))).OrderByDescending(t => t);
with grateful acknowledgement to this previous response.
Results from debugger (which need extra logic to include the matching string with its count):
- results {System.Linq.OrderedEnumerable<int,int>}
- Results View Expanding the Results View will enumerate the IEnumerable
[0] 6 int
[1] 4 int
[2] 4 int