views:

1220

answers:

9

What is the quickest way to find the first character which only appears once in a string?

+2  A: 

You can't know that the character is un-repeated until you've processed the whole string, so my suggestion would be this:

def first_non_repeated_character(string):
  chars = []
  repeated = []
  for character in string:
    if character in chars:
      chars.remove(character)
      repeated.append(character)
    else:
      if not character in repeated:
        chars.append(character)
  if len(chars):
    return chars[0]
  else:
    return False

Edit: originally posted code was bad, but this latest snippet is Certified To Work On Ryan's Computer™.

Ryan Prior
Erm... first_non_repeated_character('aaab') == 'a'
Mark Byers
Thank you. Thats what I had in mind. I somehow was thinking that there could be some other quicker solution.Can you tell me the order of this algorithm?
Harsha
This algorithm is O(n), and won't use much memory as few characters will be stored in either `chars` or `repeated` at any given time. That makes it decent-ish in my book. However, if you have a seriously long string to check, the algorithm could be massively parallelized with little extra code or worry.
Ryan Prior
I think your code works correctly now. A note about performace: testing if an element is in a list is O(m) where m is the length of the list. With a long unicode string with lots of different repeated characters, the `character in repeated` will become slow as `repeated` contains more and more elements.
Mark Byers
Can this be optimized to use only one array instead of the 2 arrays you have used, chars and repeated ?
Harsha
+9  A: 

It has to be at least O(n) because you don't know if a character will be repeated until you've read all characters.

So you can iterate over the characters and append each character to a list the first time you see it, and separately keep a count of how many times you've seen it (in fact the only values that matter for the count is "0", "1" or "more than 1").

When you reach the end of the string you just have to find the first character in the list that has a count of exactly one.


Example code in Python:

def first_non_repeated_character(s):
    counts = defaultdict(int)
    l = []
    for c in s:
        counts[c] += 1
        if counts[c] == 1:
            l.append(c)

    for c in l:
        if counts[c] == 1:
            return c

    return None

This runs in O(n).

Mark Byers
A: 

Why not make a recursive function? Pseudo-code:

First-Unrepeated(String s, int index) : char
    if index = length(s)
        Return special case: "all characters are equal"    
    else if s contains the character s[index] at another position than index
        Return First-Unrepeated(s, index+1)
    else return s[index]
Frederik Wordenskjold
Omg sorry for the bad formatting. Its written on my iPhone. Gonna fix it tomorrow.
Frederik Wordenskjold
+1  A: 

Why not use a heap based data structure such as a minimum priority queue. As you read each character from the string, add it to the queue with a priority based on the location in the string and the number of occurrences so far. You could modify the queue to add priorities on collision so that the priority of a character is the sum of the number appearances of that character. At the end of the loop, the first element in the queue will be the least frequent character in the string and if there are multiple characters with a count == 1, the first element was the first unique character added to the queue.

just_wes
A: 

in C, this is almost Shlemiel the Painter's Algorithm (not quite O(n!) but more than 0(n2)).

But will outperform "better" algorithms for reasonably sized strings because O is so small. This can also easily tell you the location of the first non-repeating string.

char FirstNonRepeatedChar(char * psz)
{
   for (int ii = 0; psz[ii] != 0; ++ii)
   {
      for (int jj = ii+1; ; ++jj)
      {
         // if we hit the end of string, then we found a non-repeat character.
         //
         if (psz[jj] == 0)
            return psz[ii]; // this character doesn't repeat

         // if we found a repeat character, we can stop looking.
         //
         if (psz[ii] == psz[jj])
            break; 
      }
   }

   return 0; // there were no non-repeating characters.
}

edit: this code is assuming you don't mean consecutive repeating characters.

John Knoeller
This algorithm does in fact operate in O(n^2). It'd be an interesting exercise to design an O(nlogn) algorithm for the problem.
Chris
Edit: @Mark Byers has an O(n) algorithm. So simple.
Chris
@Chris: Yes, but 0 is orders of magnitude larger for Mark Byers' algorithm. You can do a lot of re-scanning the same area of memory in the time it takes to do a single new.
John Knoeller
While it undoubtedly does use dynamic allocation in Python, there's no real need for it to do so -- the sizes of the structures involved are both bounded by the size of the character set (though that's not much help if it's UCS-4 Unicode).
Jerry Coffin
+1  A: 

Counter requires Python2.7 or Python3.1

>>> from collections import Counter
>>> def first_non_repeated_character(s):
...     counts = Counter(s)
...     for c in s:
...         if counts[c]==1:
...             return c
...     return None
... 
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'
gnibbler
I think that your algorithm will fail with "aaaebbbcddd" and still say that 'c' is the first non repeated character.
Peter M
@Peter M, I'm curious why you would think that. It's easy enough to try and see that it works!
gnibbler
@gnibbler .. *slaps head* Its because I was thinking terms of C arrays and not Python Array/Lists or however you want to describe it. Oops. My Bad.
Peter M
+2  A: 

Here is another fun way to do it. Counter requires Python2.7 or Python3.1

>>> from collections import Counter
>>> def first_non_repeated_character(s):
...     return min((k for k,v in Counter(s).items() if v<2), key=s.index)
...
>>> first_non_repeated_character("aaabbbcddd")
'c'
>>> first_non_repeated_character("aaaebbbcddd")
'e'
gnibbler
A: 

how about using a suffix tree for this case... the first unrepeated character will be first character of longest suffix string with least depth in tree..

sumit
A: 

1> read each character from the String 2> put in a Map (If character not present put else break) 3> print it as repeated char

Can any one plz tell it's optimal or not if not why?

asHIS