views:

732

answers:

2

I'm looping through each line of a series of CURL returned http headers, trying to detect when one ends and the next begins. I know that an http header terminates with an empty line, but what character is used to represent this line break in php? I've tried with \n but it doesn't seem to work. I certainly could be doing something wrong.

What character is used to represent the line break used to terminate a header?

Here's my existing code:

$redirect = '';
$regs = '';
foreach ($curl_response as $line)
{ 
 if ($line != "\n")
 { # line is not a linebreak, so we're still processing a header block

  if (preg_match("(HTTP/[0-9]\.[0-9] [0-9]{3} .*)",$line))
  { # line is the status code
   # highlight the outputted line
   $output .= "<b style='background: yellow;'>$line</b>";
  }

  elseif (preg_match("/^Location: (.*)$/m",$line,$regs)) 
  { # the line is a location header, so grab the location being redirected to
   # highlight the outputted line
   $output .= "<b style='background: purple; color: white;'>$line</b>";
   $redirect = $regs[1];
  }

  else 
  { # some other header, record to output
   $output .= $line;
  }

 }

 else 
 { # we've reached a line break, so we're getting to a new block of redirects
  $output .= "\nreached line break\n";
  if ($redirect != '')
  { # if we recorded a redirect above, append it to output
   $output .= "\n\nRedirecting to $redirect\n\n";
   $redirect = '';
  }

 } 

}

echo $output;

Solved - Turns out that \r is what I should have been matching on. Very odd. Not sure if this changes per site, or if it's something set in curl. So far its \r on all sites I've tried.

Edit 2: Doh. I think it's because in order to get the header into an array of lines, I exploded it on \n. So perhaps any \r\n are now just \r...

$c = explode("\n",$content);
+3  A: 

You need to also check for "\r\n" and "\r", as those are also valid terminating empty lines.

When in canonical form, media subtypes of the "text" type use CRLF as the text line break. HTTP relaxes this requirement and allows the transport of text media with plain CR or LF alone representing a line break when it is done consistently for an entire entity-body. HTTP applications MUST accept CRLF, bare CR, and bare LF as being representative of a line break in text media received via HTTP.

-- HTTP/1.1: Protocol Parameters - 3.7.1 Canonicalization and Text Defaults

Chad Birch
+1  A: 

The headers terminate with a double line break with no space in between (ie an empty line). A line break can be either "\n" or "\r\n" (or, as I have just learned, in some cases "\r", but I don't think that's common).

Perhaps you could match it with a regular expression like

list($headers) = preg_split('/\r?\n\r?\n|\r\r/S', $httpresponse);
thomasrutter