views:

133

answers:

6

Hi folks:

I have a question about algorithm:

How to find all characters in a string whose appearance is greater than a specific number, say 2 for example efficiently?

Regards.

+2  A: 
s= #your string
h=Hash.new(0)

s.each_char {|c| h[c]+=1 }
h.select {|char,count| count>2}
Ken Bloom
A: 

I would sort the string, then just walk through it and keep a running tally for each letter. The last is just O(n) so it'll be as efficient as your sort.

Trevel
Sorting is redundant for this task.
Hun1Ahpu
sorting will be at least O(n*log n) which is not required.
Jack
@Jack: considering that most of the answers are modified counting sorts, and are O(n) just like counting sort is, it seems a bit bizarre to assert that sorting is at least O(n*log n). You're thinking of the bound on comparison sorts, but Trevel didn't say he'd use a comparison sort.
Steve Jessop
Well, he said "I would sort the string" which is not required.
Jack
+1  A: 
var word = "......";
var chars = word.GroupBy(w => w).Where(g => g.Count > 2).Select(g => new { character = g.Key, count = g.Count });
alejandrobog
+6  A: 

Counting sort will be extremely efficient for one-byte encodings, border case is two-byte encodings. For wider encodings it is not so efficient, but counting array may be replaced with hash table.

EDIT: By the way, that is too general solution, doing only counting phase and outputting results on the fly will be more than enough.

actual
Indeed a hash_table sounds like the right tool for the job, and +1 for mentioning encoding, too many forget about it!
Matthieu M.
+1  A: 

Couldn't resist to try this out.

  1. Keep an internal array of 256 elements for each (ASCII) character.
  2. Loop once over the input string.
  3. Increment the count for given character using the ordinal value of the character as a direct access into the internal array.

Delphi implementation

Type
  TCharCounter = class(TObject)
  private
    FCounts: array[0..255] of byte;
  public
    constructor Create(const Value: string);
    function Count(const AChar: Char): Integer;
  end;

{ TCharCounter }

constructor TCharCounter.Create(const Value: string);
var
  I: Integer;
begin
  inherited Create;
  for I := 1 to Length(Value) do
    Inc(FCounts[Ord(Value[I])]);
end;

function TCharCounter.Count(const AChar: Char): Integer;
begin
  Result := FCounts[Ord(AChar)];
end;
Lieven
A: 

the easier way is to use an array: occurrence[256], initialize them all with 0's

and for every char in string, occurrence[(int)char]++.

And then you just scan the occurrence to find the occurrence of characters satisfying your criterion.

sza