ansaurus

Question

Answer 1

A:

This might work in two loops per word:

1) Loop over the word counting the number of distinct symbols that appear. (This will require extra storage at most equal to the length of the string - probably some sort of hash.)

2) Loop over the word counting the number of times symbol n is different from symbol n+1.

If those two values aren't different by exactly one, the word is not grouped.

Bill Carey 2010-02-11 20:00:57

My solution also uses two loops per word.

nthrgeek 2010-02-11 20:03:00

and there should be one more distinct symbol than number of times n is different from symbol n+1. You can use a set to find the number of distinct symbols.

Justin Peel 2010-02-11 20:04:45

That's quite true. The loops in your algorithm are nested, though, where these aren't (though there may be an implicit nested loop in the creation of the list of distinct symbols).

Bill Carey 2010-02-11 20:07:19

This is O(n), while the asker's solution is O(n^2).

Anon. 2010-02-11 20:07:44

@ Bill Carey : Yes,there may be an implicit nested loop in the creation of the list of distinct symbols.

nthrgeek 2010-02-11 20:12:07

Answer 2

+1 A:

Just considering one word, here is an O(n log n) destructive algorithm:

std::string::iterator unq_end = std::unique( word.begin(), word.end() );
std::sort( word.begin(), unq_end );
return std::unique( word.begin(), unq_end ) == unq_end;

Edit: The first call to unique reduces runs of consecutive letters to single letters. The call to sort groups identical letters together. The second call to unique checks whether sort formed any new groups of consecutive letters. If it did, then the word must not be grouped.

Advantage over the others posted is that it doesn't require storage — although that's not much of an advantage.

Here's a simple version of the alternative algo, also requiring only O(1) storage (and yes, also tested):

if ( word.empty() ) return true;
bitset<CHAR_MAX+1> symbols;
for ( string::const_iterator it = word.begin() + 1; it != word.end(); ++ it ) {
    if ( it[0] == it[-1] ) continue;
    if ( symbols[ it[0] ] ) return false;
    symbols[ it[-1] ] = true;
}
return ! symbols[ * word.rbegin() ];

Note that you would need minor modifications to work with characters outside ASCII. bitset comes from the header <bitset>.

Potatoswatter 2010-02-11 20:04:36

That will not work.Did you checked ?

nthrgeek 2010-02-11 20:06:59

Yes, I just did and it does work. What is the problem with it?

Potatoswatter 2010-02-11 20:11:05

I considered word as vector<string>

nthrgeek 2010-02-11 20:15:14

No, this is only for one word, so word is a string

Justin Peel 2010-02-11 20:17:16

Lovely, I like your solution :)

nthrgeek 2010-02-11 20:20:21

This is not exactly O(nlog n) however. You are sorting the letters of each word. This takes O(L log L). So worst case, if all words have length L, is O(nL log L).

IVlad 2010-02-11 20:21:11

@IVlad: yes, my complexity statement is in the context of considering one word… since the words appear to be unrelated, this is the only interesting part.

Potatoswatter 2010-02-11 20:27:39

umm Could you please explain ur first solution ?

nthrgeek 2010-02-11 20:27:59

@nthrugeek: done.

Potatoswatter 2010-02-11 20:30:44

Excellent ! Accepted :)

nthrgeek 2010-02-11 20:32:10

Answer 3

+1 A:

You could use a Set of some kind (preferable one with O(1) insertion and lookup times).

Each time you encounter a character that differs from the previous one, check if the set contains it. If it does, your match fails. If it doesn't, add it to the set and carry on.

Anon. 2010-02-11 20:05:47

Yes that will work,my initial thoughts is like this.

nthrgeek 2010-02-11 20:09:29

@Anon. I guess we had the same idea...

Benoît 2010-02-11 20:15:04

I suppose a set with O(1) insertion and lookup would be also known as array (for char which has a really limited range) - or a bitset as in Potatoswatter's answer.

UncleBens 2010-02-11 23:03:52

Answer 4

+2 A:

Try the following :

bool isGrouped( string const& str )
{
  set<char> foundCharacters;
  char currentCharacter='\0';

  for( int i = 0 ; i < str.size() ; ++i )
  {
    char c = str[i];
    if( c != currentCharacter )
    {
      if( foundCharacters.insert(c).second )
      {
        currentCharacter = c;
      }
      else
      {
        return false;
      }
    }
  }
  return true;
}

Benoît 2010-02-11 20:11:48

@ Benoît : Nice one !

nthrgeek 2010-02-11 20:17:55

+1. Just my 2 cents: you're looking up each character twice (one in `find` and another one in `insert`). There's an overload of `insert` that can be used to avoid that (the overload that returns a `pair<iterator,bool>`

Manuel 2010-02-11 20:23:15

@Manuel : thanks for the tip. I updated my answer to use insert efficiently.

Benoît 2010-02-11 20:43:58

Answer 5

A:

Here's a way with two loops per word, except one of the loops isn't up until the word length, but up until the alphabet size. Worst case is O(N*L*s), where N = number of words, L = length of words, s = alphabet size:

for each word wrd:
{
  for each character c in the alphabet:
  {
    for each letter i in wrd:
    {
      let poz = last position of character c in wrd. initially poz = -1
      if ( poz == -1 && c == wrd[i] )
         poz = i;
      else if ( c == wrd[i] && poz != i - 1 )
         // definitely not grouped, as it's separated by at least one letter from the prev sequence
    }
  }
  // grouped if the above else condition never executed
}

basically, checks if every letter in the alphabet either doesn't exist or it appears in only one substring of that letters.

IVlad 2010-02-11 20:16:12

We can do better: O(N(L + s^2)) if we keep this information for each word: first[i] = first occurrence of character i in the current word, last[i] = last occurrence of character i in the current word. These can be found with one traversal of the word.Now, for each character c, check if there is a character c' != c such that first[c] < first[c'] < last[c]. If found, the word is not grouped.

IVlad 2010-02-11 20:26:56

Answer 6

A:

    public static Boolean isGrouped( String input )
    {
        char[] c = input.ToCharArray();
        int pointer = 0;
        while ( pointer < c.Length - 1 )
        {
            char current = c[pointer];
            char next = c[++ pointer];
            if (   next != current && 
                 ( next + 1 ) != current && 
                 ( next - 1 ) == current 
               ) return false; 
        }
        return true;
    }

(C# but the principal applies)

chris 2010-02-11 20:24:59

Answer 7

A:

Here is a multi-line, verbose, regexp to match failures:

    (?:         # Non capturing group of ...
      (\S)\1*   # One or more of any non space character (capured).
    )
    (?!         # Then a position without
      \1        # ... the captured character
    ).+         # ... at least once.
    \1          # Followed by the captured character.

Or smaller:

"(?:(\S)\1*)(?!\1).+\1"

I am just presuming that C++ has a regexp implementation that is up to it, it does work in Python and should work in Perl and Ruby too.

Paddy3118 2010-02-14 13:06:56

ansaurus

tags:

views:

answers:

Best way to detect grouped words

related questions