views:

119

answers:

5

Hi I'm using Delphi 7. I would like to count the number of repetitions of every word in a large text (500 words). How could I do it?

A: 

I don't recall any built-in Delphi functions that directly do this. But a simple O(n*Log(n)) method would be to sort the words and then scan and count them.

Mark Wilkins
Hi Mark : could you please give me an example of the implementation?
@mo3ez: Sorry - I don't have the time available right now. However, Marco's word counting is a good start for the parsing into words. Put the individual words into an array and sort them. See the SO post http://stackoverflow.com/questions/41733/best-way-to-sort-an-array-in-delphi for a good example of sorting.
Mark Wilkins
A: 

If we are talking the number of words in a text string, what you could do was to parse the string, and identify the words. Add the words to a map, where the identifier is the word it self, and the value a number. This number is increased if the word you find in the string already exists in the map.

map<string, int>
foreach word in string
    if word is in map
        map[word] = map[word] + 1
    else
        map[word] = 1
    end if
end for

Since I don't know delphi that well I have tried to provide you with a pseudo code example.

TommyA
A: 

From the FPC strutils library:

function WordCount(const S: string; const WordDelims: TSysCharSet): Integer;

var
 P,PE : PChar;

begin
  Result:=0;
  P:=Pchar(pointer(S));
  PE:=P+Length(S);
  while (P<PE) do
    begin
    while (P<PE) and (P^ in WordDelims) do
      Inc(P);
    if (P<PE) then
      inc(Result);
    while (P<PE) and not (P^ in WordDelims) do
      inc(P);
    end;
end;

wordcount (test,[',','.',' ','!','?',#10,#13]); would be a good first attempt. Its meant for simple magnitude calculations, since it e.g. doesn't take care of abbreviated words.

Of course if you hand this in as homework, you'll probably be asked to explain its workings.

Marco van de Voort
While that example will count the number of words in a string, it won't count the number of occurrences each word has, as the author requested. It should be capable of this though given a few minor adjustments.
TommyA
Thank you Marco. What the second parameter is for? How can I list all the 'unique' words of a text, and then pass them one by one to the function in order to get the stats I need (occurences per word).
+2  A: 

here is a kind of brute force way of doing it. it uses a string list and stores the count of each word cast as an object to the list item.

var
  i : integer;
  iCount : integer;
  idxFound : integer;
  someText : string;
  s : TStringList;
  oneWord : string;

begin
  someText := 'this that theother and again this that theother this is not that';
  oneWord := '';

  s := TStringList.Create;
  for i := 1 to length(someText) do begin
    if someText[i] = ' ' then begin
      idxFound := s.indexof(oneWord);
      if idxFound >= 0 then begin
        iCount := integer(s.objects[idxFound]);
        s.Objects[idxFound] := TObject(iCount + 1);
      end
      else begin
        s.AddObject(oneWord, TObject(1));
      end;
      oneWord := '';
    end
    else begin
      oneWord := oneWord + someText[i];
    end;
  end;

  if oneWord <> '' then
    if idxFound >= 0 then begin
      iCount := integer(s.objects[idxFound]);
      s.Objects[idxFound] := TObject(iCount + 1);
    end
    else begin
      s.AddObject(oneWord, TObject(1));
    end;

  // put the results on the screen in a text box.
  memo1.Text := '';
  for i := 0 to s.Count - 1 do
    memo1.Lines.Add(intToStr(integer(s.Objects[i])) + ' ' + s[i]);
Don Dickinson
if it is a sorted string list, then you can use "s.find" instead of "s.indexof". also, you might want to set the case sensitivity of the string list as appropriate for your implementation.
Don Dickinson
Thanks Don : this is exactly what I want to do ! Is there any way to use a different delimiter (besides the space ' ')?
sure, just change the line: if someText[i] = ' ' then beginto check for whatever delimiter(s) you want
Don Dickinson
A: 

a TSTringList can also be used for the "list of words". Run through all of your words, and add each and everyone to the tStringlist as a new item. When your done, you have a TOTAL count, to determine the unique words, sort the list, and in a loop see if the current word is different from the previous one...if so, then increment your unique word count.

skamradt