RFC 2616 includes this:
OCTET = <any 8-bit sequence of data>
CHAR = <any US-ASCII character (octets 0 - 127)>
UPALPHA = <any US-ASCII uppercase letter "A".."Z">
LOALPHA = <any US-ASCII lowercase letter "a".."z">
ALPHA = UPALPHA | LOALPHA
DIGIT = <any US-ASCII digit "0".."9">
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
CR = <US-ASCII CR, carriage return (13)>
LF = <US-ASCII LF, linefeed (10)>
SP = <US-ASCII SP, space (32)>
HT = <US-ASCII HT, horizontal-tab (9)>
<"> = <US-ASCII double-quote mark (34)>
And then pretty much everything else in the document is defined in terms of those entities (OCTET
, CHAR
, etc.). So you could look through the RFC to find out which parts of an HTTP request/response can include OCTET
s; all other parts must be ASCII. (I'd do it myself, but it'd take a long time)
For the request line specifically, the method name and HTTP version are going to be ASCII characters only, but it's possible that the URL itself could include non-ASCII characters. But if you look at RFC 2396, it says that.
A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters.
Which I guess means that it'll consist of ASCII characters as well.