tags:

views:

45

answers:

2

Using indexOf('ÿ')// char for Unicode 255 for example, doesn't work. Probably because indexOf() needs the Unicode to be less than 127, maybe?.

Anyway, I have a big string array from an Ajax binary. I need to find every occurrence of a byte sequence, let's say the sequence is: 255,251,178. Each occurrence's location (loc of the 255 in the sequence) should then be pushed into an array to create this: arr[0]=1st sequence occurrences location arr[1]=2nd sequence occurrences location etc. The search through the string should be able to start at a specified offset. My problem is I can't even get indexOf() to detect the char for unicode 255.

Any help would be appreciated, greatly.

Pat

+1  A: 

Without evaluating the merit of your algorithm, you'll be better defining unicode chars like '\u00FF'

Second, indexOf will find any (Unicode) char you want, no such thing as "will not find less than 127".

mdrg
Unicode "less than 127" : also known as **ASCII**
Stephen P
Accepted this answer as the solution. Unicode char's should be referenced as stated above. which I did not know. Put in a loop, it works like a charm. Thanks, mdrg.
cube
A: 

It works as expected for me (using Node.js):

> var a = '\u00FF';
> console.log(a);
ÿ
> var string="hello, "+a+"there!";
> console.log(string);
hello, ÿthere!
> console.log(string.indexOf(a));
7
> console.log(string.indexOf("ÿ"));
7
> 

On the other hand, you keep mentioning "bytes". Javascript strings are Unicode characters, typically stored in more than one byte each. And you say "Ajax binary" - what's that supposed to mean? Most/all Ajax results are going to be a text format, not binary. Javascript doesn't handle binary data very well.

Could you maybe post a little more detail about what you're trying to do?

Mark Bessey
Mark@: Bytes, is what I have in the binary data that I am working with. The `responseText` from the Ajax request comes down as text, but with the help of `overrideMimeType`.... and `ff[z] = scc(text.charCodeAt(z) I have a true binary data string. So in essence, it's bytes to me, although parsing with JavaScript, it is seen as Unicode. same difference right?, a value of 255 is a value of 255, Unicode or byte. So my end goal is to find every occurrence of byte/Unicode sequence ff,fb,b2 that resides in the string, and put the locations in an array as described in the question.
cube
@cube : 255 is 0xFF as a byte, but the Unicode character \u00FF is a TWO BYTE sequence `0xC3 0xBF` when encoded in UTF-8 or a two byte sequence `0x00 0xFF` in UTF-16 (or it could be `0xFF 0x00` depending on the byte order mark) Trying to treat Unicode as a byte stream is difficult - you have to know what the encoding was and run it through a proper decoder.
Stephen P
Okay, but if you're pulling the bytes out of the response and masking them down to 8 bits already, then why are you using indexOf? I still feel like I'm missing something. Maybe if you add some code to your original question illustrating the issue, someone can point out where your problem is.
Mark Bessey
If you want to use indexOf() on the responseString, you need to search for the 16-bit sign-extended version of the character, so instead of searching for 255, 251, and 178, you'd be searching for 0xFFFF, 0xFFFB, and )xFFB2 (I think - it depends on how the browser handles those characters).
Mark Bessey
@Stephen and Mark: I used indexOf() because that is the string method (so far that I see) that returns the actual address of the needles that I am looking for. Regexp is slower than indexOf(). So, I am simply trying to carry out a search for a pattern in the byte stream, and to have the script put all the addresses of the located patterns into an array.
cube
Stephen P
@Stephen: var scc= String.fromCharCode;
cube