views:

313

answers:

3

Having the same problem as the poster of this question: http://stackoverflow.com/questions/1738227/httplib2-how-to-set-more-than-one-cookie

The cookie looks like this..

PHPSESSID=8527b5532b6018aec4159d81f69765bd; path=/; expires=Fri, 19-Feb-2010 13:52:51 GMT, id=1578; expires=Mon, 22-Feb-2010 13:37:51 GMT, password=123456; expires=Mon, 22-Feb-2010 13:37:51 GMT, sid=8527b5532b6018aec4159d81f69765bd

Note how it uses commas as well as semi-colons to separate cookies, but commas are also used in the cookie itself.

This is too complicated for me to write a regex to separate them properly, it would be very much appreciated if anyone wants to give it a shot!

+1  A: 

Have you tried cookielib / http.cookiejar?


If you interpret the cookie as this

PHPSESSID=8527b5532b6018aec4159d81f69765bd;
path=/;
expires=Fri, 19-Feb-2010 13:52:51 GMT, id=1578;
expires=Mon, 22-Feb-2010 13:37:51 GMT, password=123456; 
expires=Mon, 22-Feb-2010 13:37:51 GMT, sid=8527b5532b6018aec4159d81f69765bd

Then only the semicolon is the true separator, and the comma separator is only due to an expiration date prepending it.

If you are not interested in the expiration date, then you can use 1 regex to filter out the expiration date e.g.

s/expires=[^,]+,[^,]+, //g

then separate the whole string by ;, and parse them as key=value pairs.

KennyTM
Thanks, however it's not only expires that uses it. I didn't include the full cookie which has something like this:path=/, hash=0800fc577294c34e0b28ad2839435945;I haven't taken a look at many other cookie headers but I assume that others would have a comma used like this as well so it's not as simple as filtering out certain key values.
Cookies
+1  A: 

Note how it uses commas as well as semi-colons to separate cookies, but commas are also used in the cookie itself.

As quoted, the ambiguous commas make the string unparseable with regex or any other tool. Where is that string coming from?

As a Set-Cookie: header value it would simply be completely invalid, and wouldn't work in any browser. Browsers would set PHPSESSID as a session cookie (since the expires date format is invalid with the extra comma), and ignore the rest. Multiple cookies have to be set with multiple Set-Cookie headers, not combined into one.

Edit: OK, what seems to be happening is httplib2 is handling the HTTP response data using the stdlib email package to parse the headers. In e-mail, the RFC822 family of standards require that multiple headers with the same name (like, eg. To: addresses) are equivalent to a single header with the values joined by commas.

However, HTTP responses are explicitly not an RFC822-family standard; it is totally inappropriate to handle them this way. It would appear that by using email to parse HTTP responses, httplib2 has made itself unable to handle any multiply-used header correctly, and the Set-Cookie header is very often used like that. For this reason I consider httplib2 fundamentally broken and would advise not using it.

bobince
That's exactly how it gives it to me from the Set-Cookie response header in httplib2I don't know how else to add cookies as it has no cookielib support from the docs I've read and after taking a look at the code there's nothing regarding them.A tweak to the way headers are handled in httplib2 is probably the way to go, although I don't really know what change.Here is the pastebin:http://pastebin.ca/1802685
Cookies
Ah, OK. Looking at the code, it's a bug in `httplib2` (see edit). `urllib` allows you to get multiple headers without this broken comma-combining by using `getheaders('Header-Name')` instead of `getheader`. But no such facility appears to be available in `httplib2`.
bobince
How hard would it be to implement something such as that?
Cookies
Well first you'd have to replace the use of `email.FeedParser` for header parsing (see line 1035 of `httplib2/__init__.py`) with a proper HTTP response parser that stored multiple headers. Then you'd have to come up with a new interface to access them.
bobince
Unless you have built a lot of work on top of `httplib2` it would probably be easier to use a different HTTP library. What features do you need that are not satisfied by the stdlib `urllib` and `httplib`? (`httplib` makes the same mistake as `httplib2` when you use `HTTPResponse.getheader('Set-Cookie')`, but you can still get the proper headers from `HTTPResponse.msg.getheaders('Set-Cookie')`.
bobince
The Socks proxy tunneling is what I'm using it for
Cookies
Ok, I edited it to use msg.getheaders and now it's returning the cookie header properly, thanks!
Cookies
A: 

----------snip---------

Cookies