tags:

views:

130

answers:

7

A word is grouped if, for each letter in the word, all occurrences of that letter form exactly one consecutive sequence. In other words, no two equal letters are separated by one or more letters that are different.

Given a vector<string> return the number of grouped words.

For example :

{"ab", "aa", "aca", "ba", "bb"}

return 4.

Here, "aca" is not a grouped word.

My quick and dirty solution :

int howMany(vector <string> words) {
  int ans = 0;
  for (int i = 0; i < words.size(); i++) {
       bool grouped = true;
  for (int j = 0; j < words[i].size()-1; j++)
      if (words[i][j] != words[i][j+1])
         for (int k = j+1; k < words[i].size(); k++)
           if (words[i][j] == words[i][k])
              grouped = false;
           if (grouped) ans++;
       }
   return ans;
 }

I want a better algorithm for the same problem.

A: 

This might work in two loops per word:

1) Loop over the word counting the number of distinct symbols that appear. (This will require extra storage at most equal to the length of the string - probably some sort of hash.)

2) Loop over the word counting the number of times symbol n is different from symbol n+1.

If those two values aren't different by exactly one, the word is not grouped.

Bill Carey
My solution also uses two loops per word.
nthrgeek
and there should be one more distinct symbol than number of times n is different from symbol n+1. You can use a set to find the number of distinct symbols.
Justin Peel
That's quite true. The loops in your algorithm are nested, though, where these aren't (though there may be an implicit nested loop in the creation of the list of distinct symbols).
Bill Carey
This is O(n), while the asker's solution is O(n^2).
Anon.
@ Bill Carey : Yes,there may be an implicit nested loop in the creation of the list of distinct symbols.
nthrgeek
+1  A: 

Just considering one word, here is an O(n log n) destructive algorithm:

std::string::iterator unq_end = std::unique( word.begin(), word.end() );
std::sort( word.begin(), unq_end );
return std::unique( word.begin(), unq_end ) == unq_end;

Edit: The first call to unique reduces runs of consecutive letters to single letters. The call to sort groups identical letters together. The second call to unique checks whether sort formed any new groups of consecutive letters. If it did, then the word must not be grouped.

Advantage over the others posted is that it doesn't require storage — although that's not much of an advantage.

Here's a simple version of the alternative algo, also requiring only O(1) storage (and yes, also tested):

if ( word.empty() ) return true;
bitset<CHAR_MAX+1> symbols;
for ( string::const_iterator it = word.begin() + 1; it != word.end(); ++ it ) {
    if ( it[0] == it[-1] ) continue;
    if ( symbols[ it[0] ] ) return false;
    symbols[ it[-1] ] = true;
}
return ! symbols[ * word.rbegin() ];

Note that you would need minor modifications to work with characters outside ASCII. bitset comes from the header <bitset>.

Potatoswatter
That will not work.Did you checked ?
nthrgeek
Yes, I just did and it does work. What is the problem with it?
Potatoswatter
I considered word as vector<string>
nthrgeek
No, this is only for one word, so word is a string
Justin Peel
Lovely, I like your solution :)
nthrgeek
This is not exactly O(nlog n) however. You are sorting the letters of each word. This takes O(L log L). So worst case, if all words have length L, is O(nL log L).
IVlad
@IVlad: yes, my complexity statement is in the context of considering one word… since the words appear to be unrelated, this is the only interesting part.
Potatoswatter
umm Could you please explain ur first solution ?
nthrgeek
@nthrugeek: done.
Potatoswatter
Excellent ! Accepted :)
nthrgeek
+1  A: 

You could use a Set of some kind (preferable one with O(1) insertion and lookup times).

Each time you encounter a character that differs from the previous one, check if the set contains it. If it does, your match fails. If it doesn't, add it to the set and carry on.

Anon.
Yes that will work,my initial thoughts is like this.
nthrgeek
@Anon. I guess we had the same idea...
Benoît
I suppose a set with O(1) insertion and lookup would be also known as array (for char which has a really limited range) - or a bitset as in Potatoswatter's answer.
UncleBens
+2  A: 

Try the following :

bool isGrouped( string const& str )
{
  set<char> foundCharacters;
  char currentCharacter='\0';

  for( int i = 0 ; i < str.size() ; ++i )
  {
    char c = str[i];
    if( c != currentCharacter )
    {
      if( foundCharacters.insert(c).second )
      {
        currentCharacter = c;
      }
      else
      {
        return false;
      }
    }
  }
  return true;
}
Benoît
@ Benoît : Nice one !
nthrgeek
+1. Just my 2 cents: you're looking up each character twice (one in `find` and another one in `insert`). There's an overload of `insert` that can be used to avoid that (the overload that returns a `pair<iterator,bool>`
Manuel
@Manuel : thanks for the tip. I updated my answer to use insert efficiently.
Benoît
A: 

Here's a way with two loops per word, except one of the loops isn't up until the word length, but up until the alphabet size. Worst case is O(N*L*s), where N = number of words, L = length of words, s = alphabet size:

for each word wrd:
{
  for each character c in the alphabet:
  {
    for each letter i in wrd:
    {
      let poz = last position of character c in wrd. initially poz = -1
      if ( poz == -1 && c == wrd[i] )
         poz = i;
      else if ( c == wrd[i] && poz != i - 1 )
         // definitely not grouped, as it's separated by at least one letter from the prev sequence
    }
  }
  // grouped if the above else condition never executed
}

basically, checks if every letter in the alphabet either doesn't exist or it appears in only one substring of that letters.

IVlad
We can do better: O(N(L + s^2)) if we keep this information for each word: first[i] = first occurrence of character i in the current word, last[i] = last occurrence of character i in the current word. These can be found with one traversal of the word.Now, for each character c, check if there is a character c' != c such that first[c] < first[c'] < last[c]. If found, the word is not grouped.
IVlad
A: 
    public static Boolean isGrouped( String input )
    {
        char[] c = input.ToCharArray();
        int pointer = 0;
        while ( pointer < c.Length - 1 )
        {
            char current = c[pointer];
            char next = c[++ pointer];
            if (   next != current && 
                 ( next + 1 ) != current && 
                 ( next - 1 ) == current 
               ) return false; 
        }
        return true;
    }

(C# but the principal applies)

chris
A: 

Here is a multi-line, verbose, regexp to match failures:

    (?:         # Non capturing group of ...
      (\S)\1*   # One or more of any non space character (capured).
    )
    (?!         # Then a position without
      \1        # ... the captured character
    ).+         # ... at least once.
    \1          # Followed by the captured character.

Or smaller:

"(?:(\S)\1*)(?!\1).+\1"

I am just presuming that C++ has a regexp implementation that is up to it, it does work in Python and should work in Perl and Ruby too.

Paddy3118