views:

537

answers:

4

i see a string in this code:

data[:2] == '\xff\xfe'

i don't know what '\xff\xfe' is,

so i want to escape it ,but not successful

import cgi
print cgi.escape('\xff\xfe')#print \xff\xfe

how can i get it.

thanks

+8  A: 

'\xFF' means the byte with the hex value FF. '\xff\xfe' is a byte-order mark: http://en.wikipedia.org/wiki/Byte%5Forder%5Fmark

You could also represent it as two separate characters but that probably won't tell you anything useful.

MatrixFrog
+1  A: 
>>> print '\xff\xfe'.encode('string-escape')
\xff\xfe
Ignacio Vazquez-Abrams
A: 

You cannot escape or encode an invalid string.

You should understand that you are working with strings and not byte streams and there are some characters you cannot accept in them, first of them being 0x00 - and also your example that is happening to be a BOM sequence.

So if you need to include non-valid strings characters (unicode or ascii) you will have to stop using strings for this.

Take a look at PEP-0358

Sorin Sbarnea
It would be a very good idea if you explain what is your definition of "invalid string" and in particular what is "invalid" about "\x00" or "\xff\xfe". Have you noted that the OP appears to be using Python 2.x and not 3.x and so PEP-0358 has little relevance?
John Machin
Example: you cannot store 0x00 inside a C string because this is the string terminator. In the case of Unicode there are several other codes that you are not allowed to store inside.
Sorin Sbarnea
Have you noticed that the OP is using Python, not C? I ask again: What is invalid about "\xff\xfe"?
John Machin
Usually Python is using C strings because it is implemented in C. Now regarding the value range: if using ASCII you are allowed to use only 0..128 (ANSI is 0.255). A.so if you are using Unicode you are allowed to use a wider range of values but it happens that the two values specified to not be accepted.Why? Because if you are using ANSI instead of ASCII you'll discover that you may get different results from decode when the OS codepage is different. Take a look at MatrixFlog answer to see the meaning of 0xFFFE (can be used only at the beginning of the file).
Sorin Sbarnea
When Python is implemented in C, it doesn't "use C strings". It uses C to implement Python strings, which have quite different semantics -- in particular "\x00" is quite legal. Your ASCII/ANSI stuff is irrelevant. MatrixFlog doesn't mention 0xFFFE, he mentions '\xff\xfe' which is NOT the same thing as 0xFFFE, is a LEGAL Python string and is POSSIBLY interpretable as a BOM (depends on an agreement that the file is encoded in UTF-16; the OP has NOT supplied that info). U+FEFF not at the start of UTF-16 file is a zero-width no-break space (quite legal).
John Machin
+1  A: 

What is the connection between "i don't know what '\xff\xfe' is" and "so i want to escape it"? What is the purpose of "escaping" it?

It would help enormously if you gave a little more context than data[:2] == '\xff\xfe' (say a few line before and after) ... however it looks like it is testing whether the first two bytes of data could possibly represent an UTF-16 littleendian byte order mark. In that case you could do something like:

UTF16_LE_BOM = "\xff\xfe"

# much later
if data[:2] == UTF16_LE_BOM:
    do_something()
John Machin