ansaurus

Question

Interview Question: Finding Next and Previous characters in a given string?

Answer 1

+6 A:

It seems in the worst case, we need to go through the whole string. Just consider chars A = 100 and B = 200, C = 201 and the following strings of length N:

S1 = ABCBCBC...BC

S2 = BBCBCBC...BC

Olexiy 2009-08-31 14:33:05

Yes, But I want to find out a better algorithm.

raj 2009-08-31 14:40:24

As you can see, there is NO better algorithm in the worst case. If you want something better in average, see bobince's comment.

Olexiy 2009-08-31 14:57:13

Answer 2

A:

assuming you know how to move backwards in the string (and assuming the pointer they give you is actually guaranteed to be to the first byte of the current character if it's multi-byte). Just move back two bytes.

If the data at current -2 >127 then the previous character is the last two bytes. if the data at current -2 is <127 then the previous character is at current -1.

Similar for the next character. If the data at current + 1 is < 127 then it is the next character, otherwise it's the beginning of a multi-bte character.

If you can't move backwards then there's no way to do it short of reading through the whole string until you get to the current position. Use a stack to keep track of the last two bytes, when you hit the current address then all the data you need for the previous character is in the stack.

patros 2009-08-31 14:35:10

Fails when current-2 is a high byte and second byte of a two-byte sequence.

bobince 2009-08-31 14:41:15

Ahhh, true. Won't bother editing, as your solution looks good.

patros 2009-08-31 17:38:13

Answer 3

+1 A:

Scan backwards until you either hit two consecutive bytes less than 127 or you hit the beginning of the string. You can now count characters up to where you are and on to one after the current character.

stonemetal 2009-08-31 14:38:04

Answer 4

A:

Robert 2009-08-31 14:39:08

The byte at -2 offset can be >127 and yet not the start of a character but a second byte in a two-byte character.

bobince 2009-08-31 14:42:51

It is not correct.In previous case,if the byte > 127 it can be second byte of two byte character also.Same in the Next case

raj 2009-08-31 14:43:18

Answer 5

+11 A:

Hello Shift-JIS!

No, constant time is impossible as the worst case is, as Olexiy states, that almost the whole string is top-bit-set bytes and you need to backtrack right to the start to find out which is the first top-bit-set byte in the first two-byte sequence.

Hopefully this pathological case is rare and you can just step back a byte at a time until you meet any low byte, in which case you can be sure that the byte after it is the start of a character. You can then walk forwards again until you meet your original pointer.

bobince 2009-08-31 14:39:24

hughdbrown 2009-11-17 17:01:23

Answer 6

A:

Really you need to find three characters: current character, previous character, and next character.

CurrentChar is either at position P given by the pointer or at P-1. If position P is pointing at a byte that is greater than 127 then P is the CurrentChar. If P is less than 127 look at P-1. If P-1 is greater than 127 then CurrentChar is P-1 otherwise CurrentChar is at position P.

To get the PreviousChar, look at CurrentChar - 2 and if that is greater than 127 PreviousChar = CurrentChar -2 otherwise PreviousChar = CurrentChar -1

The NextChar can be gotten by looking at P. If P is greater than 127 then the next char is at P+2. If P is less than 127 NextChar is at position P+1.

G_A 2009-08-31 14:46:02

It is not correct, If the position P is pointing at a byte that is greater than 127, it can the second byte of two byte character also

raj 2009-08-31 14:48:49

Answer 7

+1 A:

Let's see... You're already pointing to a character which can be a single byte or double byte. In the latter case, you could be on the first or second byte of this character. So first you need to know if you're at the start of a character or not. To make matters worse, the second byte of a two-bytes character can be any value, thus there's a chance that all bytes are greater than 127! That makes it nasty to find the previous character. But first, determine if you're at the start of a new character or in the middle of one.

Determining character start: Go back one byte until you find a byte that doesn't have it's highest bit set. (Or the beginning of the string.) Based on the number of bytes with the highest bit set, you will know if you're at the beginning or not. An odd number of high bits set before the current byte means you have to go backwards one byte for the current character.
Determining previous character: Go back two bytes. If the highest bit is set, you found it. If not, go one byte forwards.
Determining next character: Check the highest bit of the current byte. If set, go two bytes forwards. Otherwise, just one.
Determine number of characters means you go to the start of the string and check the highest bit of every byte. If set, add one to your count and skip one byte. If not set, just add one to your count. Unfortunately, you will have to go through the whole string.

I do assume that you have some way to indicate the start and end of the string. If not, there's no indication of where it starts and stops. (Unless you use a null byte to indicate start/stop in which case you just can't skip bytes because you might skip such a null byte.)

Is there a faster way? Well, if you know start/end locations then you can calculate the number of bytes in this string. The number of characters would be the number of bytes in this string with their highest bits not set. So only cound the number of bytes lower than 128!

Workshop Alex 2009-08-31 15:33:17

Answer 8

A:

Assuming that arr[] is like a string and pointer to a byte is pointer to INDEX

#include<stdio.h>
#include<stdlib.h>
int find_valid(int);
int arr[]={128,12,128,19,128,127,128,12,32,145,12,223,54,76,23};
int main(){
    int index=1;
    while(index != 0){
    printf("\nEnter the index:");
    scanf("%d",&index);
    if(arr[index] < 128){  // it can be the first byte or the second byte of the Two byte
             if( arr[index -1] < 128 ) printf("\nThis is the first byte in itself"); 
             else // index-1 >= 128
                {
                   int count = find_valid(index-2);
                   if(count%2 == 0) printf("\n This is the second byte of the two bytes:");       
                   else printf("\nThis is the first byte in itself:");
                }
             }
    else{ // it can be the second or the first in the two bytes
           if ( arr[index - 1] < 128 ) printf("\n This is the First byte of the two bytes:");
           else{
                 int count = find_valid(index-2);
                 if(count%2 == 0) printf("\n This is the second byte of the two bytes:");
                 else printf("\nThis is the First byte of the two bytes:");
                }
          }         

    printf("\nHave u seen the result:");
    scanf("%d",&index);}
}

int find_valid(int i){
    int count=0;
    while( (arr[i] > 127) && (i>0) ) { ++count; --i;}
        return count;    
}

Prashant Golash 2010-05-28 12:12:40

To format code properly, select it and press the "101010" button.

Tyler McHenry 2010-05-28 12:16:33

ansaurus

tags:

views:

answers:

Interview Question: Finding Next and Previous characters in a given string?

related questions