views:

432

answers:

8

We have a language X, which has one byte and two byte characters. This language has following characteristics.

  1. Single byte character value will always be less than or equal to 127.
  2. In Two byte character, First byte will always be greater than 127 and second byte value can be anything.

The problem is, We are given an arbitrary length string and pointer pointing to some byte in the string and we have to find out what is the previous character and what is the next character.

One simple approach will be start from beginning of the string, checking the value of the byte and comparing the pointers until we reach the given pointer. But in the worst case that is, if the given pointer pointing to last byte in the given string, we have to loop through all the charaters.

I would like to know is there any better algorithm which will give the result in constant time irrespective of the length of the string?

+6  A: 

It seems in the worst case, we need to go through the whole string. Just consider chars A = 100 and B = 200, C = 201 and the following strings of length N:

S1 = ABCBCBC...BC

S2 = BBCBCBC...BC

Olexiy
Yes, But I want to find out a better algorithm.
raj
As you can see, there is NO better algorithm in the worst case. If you want something better in average, see bobince's comment.
Olexiy
A: 

assuming you know how to move backwards in the string (and assuming the pointer they give you is actually guaranteed to be to the first byte of the current character if it's multi-byte). Just move back two bytes.

If the data at current -2 >127 then the previous character is the last two bytes. if the data at current -2 is <127 then the previous character is at current -1.

Similar for the next character. If the data at current + 1 is < 127 then it is the next character, otherwise it's the beginning of a multi-bte character.

If you can't move backwards then there's no way to do it short of reading through the whole string until you get to the current position. Use a stack to keep track of the last two bytes, when you hit the current address then all the data you need for the previous character is in the stack.

patros
Fails when current-2 is a high byte and second byte of a two-byte sequence.
bobince
Ahhh, true. Won't bother editing, as your solution looks good.
patros
+1  A: 

Scan backwards until you either hit two consecutive bytes less than 127 or you hit the beginning of the string. You can now count characters up to where you are and on to one after the current character.

stonemetal
A: 
Robert
The byte at -2 offset can be >127 and yet not the start of a character but a second byte in a two-byte character.
bobince
It is not correct.In previous case,if the byte > 127 it can be second byte of two byte character also.Same in the Next case
raj
+11  A: 

Hello Shift-JIS!

No, constant time is impossible as the worst case is, as Olexiy states, that almost the whole string is top-bit-set bytes and you need to backtrack right to the start to find out which is the first top-bit-set byte in the first two-byte sequence.

Hopefully this pathological case is rare and you can just step back a byte at a time until you meet any low byte, in which case you can be sure that the byte after it is the start of a character. You can then walk forwards again until you meet your original pointer.

bobince
hughdbrown
A: 

Really you need to find three characters: current character, previous character, and next character.

CurrentChar is either at position P given by the pointer or at P-1. If position P is pointing at a byte that is greater than 127 then P is the CurrentChar. If P is less than 127 look at P-1. If P-1 is greater than 127 then CurrentChar is P-1 otherwise CurrentChar is at position P.

To get the PreviousChar, look at CurrentChar - 2 and if that is greater than 127 PreviousChar = CurrentChar -2 otherwise PreviousChar = CurrentChar -1

The NextChar can be gotten by looking at P. If P is greater than 127 then the next char is at P+2. If P is less than 127 NextChar is at position P+1.

G_A
It is not correct, If the position P is pointing at a byte that is greater than 127, it can the second byte of two byte character also
raj
+1  A: 

Let's see... You're already pointing to a character which can be a single byte or double byte. In the latter case, you could be on the first or second byte of this character. So first you need to know if you're at the start of a character or not. To make matters worse, the second byte of a two-bytes character can be any value, thus there's a chance that all bytes are greater than 127! That makes it nasty to find the previous character. But first, determine if you're at the start of a new character or in the middle of one.

  • Determining character start: Go back one byte until you find a byte that doesn't have it's highest bit set. (Or the beginning of the string.) Based on the number of bytes with the highest bit set, you will know if you're at the beginning or not. An odd number of high bits set before the current byte means you have to go backwards one byte for the current character.

  • Determining previous character: Go back two bytes. If the highest bit is set, you found it. If not, go one byte forwards.

  • Determining next character: Check the highest bit of the current byte. If set, go two bytes forwards. Otherwise, just one.

  • Determine number of characters means you go to the start of the string and check the highest bit of every byte. If set, add one to your count and skip one byte. If not set, just add one to your count. Unfortunately, you will have to go through the whole string.

I do assume that you have some way to indicate the start and end of the string. If not, there's no indication of where it starts and stops. (Unless you use a null byte to indicate start/stop in which case you just can't skip bytes because you might skip such a null byte.)

Is there a faster way? Well, if you know start/end locations then you can calculate the number of bytes in this string. The number of characters would be the number of bytes in this string with their highest bits not set. So only cound the number of bytes lower than 128!

Workshop Alex
A: 

Assuming that arr[] is like a string and pointer to a byte is pointer to INDEX

#include<stdio.h>
#include<stdlib.h>
int find_valid(int);
int arr[]={128,12,128,19,128,127,128,12,32,145,12,223,54,76,23};
int main(){
    int index=1;
    while(index != 0){
    printf("\nEnter the index:");
    scanf("%d",&index);
    if(arr[index] < 128){  // it can be the first byte or the second byte of the Two byte
             if( arr[index -1] < 128 ) printf("\nThis is the first byte in itself"); 
             else // index-1 >= 128
                {
                   int count = find_valid(index-2);
                   if(count%2 == 0) printf("\n This is the second byte of the two bytes:");       
                   else printf("\nThis is the first byte in itself:");
                }
             }
    else{ // it can be the second or the first in the two bytes
           if ( arr[index - 1] < 128 ) printf("\n This is the First byte of the two bytes:");
           else{
                 int count = find_valid(index-2);
                 if(count%2 == 0) printf("\n This is the second byte of the two bytes:");
                 else printf("\nThis is the First byte of the two bytes:");
                }
          }         

    printf("\nHave u seen the result:");
    scanf("%d",&index);}
}

int find_valid(int i){
    int count=0;
    while( (arr[i] > 127) && (i>0) ) { ++count; --i;}
        return count;    
}   
Prashant Golash
To format code properly, select it and press the "101010" button.
Tyler McHenry