tags:

views:

52

answers:

4

I noticed that Feedzirra uses this regex to get the ETag from response header:

/.*ETag:\s(.*)\r/

Personally I would have written this one:

/ETag:\s(.*)\n/

Here the questions:

  1. Why does it put .* at the beginning even if it is unnecessary (\A is not specified)?
  2. Why does it use \r instead of \n? What is the difference?
A: 

The HTTP RFC mandates CRLF as a linebreak for HTTP message. So \n would match extra \r with properly formatted message:

    generic-message = start-line
                      *(message-header CRLF)
                      CRLF
                      [ message-body ]
    start-line      = Request-Line | Status-Line

That said, I would make it [\r\n] for the sake of robustness.

Michael Krelin - hacker
Why would \n match \r? They're two different characters.
Rob Kennedy
No, Rob, of course it won't. I meant that the expression with `\n` instead of `\r` would match `\r` in `.*` part, because it comes before `\n`.
Michael Krelin - hacker
+1  A: 

I would agree .* at the beginning is not needed

/r and /n are different characters. 

/r = line ending for old macs
/n = line ending for *nix
/r/n = line ending for windows

probably [\r\n] would be best
Devin Ceartas
+2  A: 
  1. Completeness I dare say. It's not part of a capture. There may be an implicit start of line anchor though, depending on language and implementation, and in this case it may be necessary.
  2. The HTTP spec says that HTTP is to use "\r\n" as a line ending. In most programming languages, only "\n" is treated as a line ending. The \r makes sure that the \r is not swallowed inside the .* which would give erroneous whitespace at the end of the capture.
Matthew Scharley
A: 

Let’s take a look into the HTTP specification:

The ETag header field is defined as:

ETag = "ETag" ":" entity-tag

The entitiy-tag is defined as:

entity-tag = [ weak ] opaque-tag
weak       = "W/"
opaque-tag = quoted-string

And quoted-string is defined as:

quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )

So the ETag header field value may contain a line break. And the correct regular expression would be:

/ETag:\s+(?:W\/)?"(?:[ !#-\x7E\x80-\xFF]*|\r\n[\t ]|\\.)*"/
Gumbo