Hi I'm using Delphi 7. I would like to count the number of repetitions of every word in a large text (500 words). How could I do it?
I don't recall any built-in Delphi functions that directly do this. But a simple O(n*Log(n)) method would be to sort the words and then scan and count them.
If we are talking the number of words in a text string, what you could do was to parse the string, and identify the words. Add the words to a map, where the identifier is the word it self, and the value a number. This number is increased if the word you find in the string already exists in the map.
map<string, int>
foreach word in string
if word is in map
map[word] = map[word] + 1
else
map[word] = 1
end if
end for
Since I don't know delphi that well I have tried to provide you with a pseudo code example.
From the FPC strutils library:
function WordCount(const S: string; const WordDelims: TSysCharSet): Integer;
var
P,PE : PChar;
begin
Result:=0;
P:=Pchar(pointer(S));
PE:=P+Length(S);
while (P<PE) do
begin
while (P<PE) and (P^ in WordDelims) do
Inc(P);
if (P<PE) then
inc(Result);
while (P<PE) and not (P^ in WordDelims) do
inc(P);
end;
end;
wordcount (test,[',','.',' ','!','?',#10,#13]); would be a good first attempt. Its meant for simple magnitude calculations, since it e.g. doesn't take care of abbreviated words.
Of course if you hand this in as homework, you'll probably be asked to explain its workings.
here is a kind of brute force way of doing it. it uses a string list and stores the count of each word cast as an object to the list item.
var
i : integer;
iCount : integer;
idxFound : integer;
someText : string;
s : TStringList;
oneWord : string;
begin
someText := 'this that theother and again this that theother this is not that';
oneWord := '';
s := TStringList.Create;
for i := 1 to length(someText) do begin
if someText[i] = ' ' then begin
idxFound := s.indexof(oneWord);
if idxFound >= 0 then begin
iCount := integer(s.objects[idxFound]);
s.Objects[idxFound] := TObject(iCount + 1);
end
else begin
s.AddObject(oneWord, TObject(1));
end;
oneWord := '';
end
else begin
oneWord := oneWord + someText[i];
end;
end;
if oneWord <> '' then
if idxFound >= 0 then begin
iCount := integer(s.objects[idxFound]);
s.Objects[idxFound] := TObject(iCount + 1);
end
else begin
s.AddObject(oneWord, TObject(1));
end;
// put the results on the screen in a text box.
memo1.Text := '';
for i := 0 to s.Count - 1 do
memo1.Lines.Add(intToStr(integer(s.Objects[i])) + ' ' + s[i]);
a TSTringList can also be used for the "list of words". Run through all of your words, and add each and everyone to the tStringlist as a new item. When your done, you have a TOTAL count, to determine the unique words, sort the list, and in a loop see if the current word is different from the previous one...if so, then increment your unique word count.