ansaurus

Question

find every ocurence of char codes 255,251,178 respectively. In a string or array.

Answer 1

+1 A:

Without evaluating the merit of your algorithm, you'll be better defining unicode chars like '\u00FF'

Second, indexOf will find any (Unicode) char you want, no such thing as "will not find less than 127".

mdrg 2010-10-22 15:22:29

Unicode "less than 127" : also known as **ASCII**

Stephen P 2010-10-22 17:37:30

Accepted this answer as the solution. Unicode char's should be referenced as stated above. which I did not know. Put in a loop, it works like a charm. Thanks, mdrg.

cube 2010-10-22 21:11:53

Answer 2

A:

It works as expected for me (using Node.js):

> var a = '\u00FF';
> console.log(a);
ÿ
> var string="hello, "+a+"there!";
> console.log(string);
hello, ÿthere!
> console.log(string.indexOf(a));
7
> console.log(string.indexOf("ÿ"));
7
>

On the other hand, you keep mentioning "bytes". Javascript strings are Unicode characters, typically stored in more than one byte each. And you say "Ajax binary" - what's that supposed to mean? Most/all Ajax results are going to be a text format, not binary. Javascript doesn't handle binary data very well.

Could you maybe post a little more detail about what you're trying to do?

Mark Bessey 2010-10-22 16:56:48

Mark@: Bytes, is what I have in the binary data that I am working with. The `responseText` from the Ajax request comes down as text, but with the help of `overrideMimeType`.... and `ff[z] = scc(text.charCodeAt(z) I have a true binary data string. So in essence, it's bytes to me, although parsing with JavaScript, it is seen as Unicode. same difference right?, a value of 255 is a value of 255, Unicode or byte. So my end goal is to find every occurrence of byte/Unicode sequence ff,fb,b2 that resides in the string, and put the locations in an array as described in the question.

cube 2010-10-22 17:28:26

@cube : 255 is 0xFF as a byte, but the Unicode character \u00FF is a TWO BYTE sequence `0xC3 0xBF` when encoded in UTF-8 or a two byte sequence `0x00 0xFF` in UTF-16 (or it could be `0xFF 0x00` depending on the byte order mark) Trying to treat Unicode as a byte stream is difficult - you have to know what the encoding was and run it through a proper decoder.

Stephen P 2010-10-22 17:46:26

Okay, but if you're pulling the bytes out of the response and masking them down to 8 bits already, then why are you using indexOf? I still feel like I'm missing something. Maybe if you add some code to your original question illustrating the issue, someone can point out where your problem is.

Mark Bessey 2010-10-22 18:05:23

If you want to use indexOf() on the responseString, you need to search for the 16-bit sign-extended version of the character, so instead of searching for 255, 251, and 178, you'd be searching for 0xFFFF, 0xFFFB, and )xFFB2 (I think - it depends on how the browser handles those characters).

Mark Bessey 2010-10-22 18:08:03

@Stephen and Mark: I used indexOf() because that is the string method (so far that I see) that returns the actual address of the needles that I am looking for. Regexp is slower than indexOf(). So, I am simply trying to carry out a search for a pattern in the byte stream, and to have the script put all the addresses of the located patterns into an array.

cube 2010-10-22 18:29:32

Stephen P 2010-10-22 18:51:23

@Stephen: var scc= String.fromCharCode;

cube 2010-10-22 19:02:15

ansaurus

tags:

views:

answers:

find every ocurence of char codes 255,251,178 respectively. In a string or array.

related questions