views:

1158

answers:

8

I need to take a header like this:

 Authorization: Digest qop="chap",
     realm="[email protected]",
     username="Foobear",
     response="6629fae49393a05397450978507c4ef1",
     cnonce="5ccc069c403ebaf9f0171e9517f40e41"

And parse it into this using Python:

{'protocol':'Digest',
  'qop':'chap',
  'realm':'[email protected]',
  'username':'Foobear',
  'response':'6629fae49393a05397450978507c4ef1',
  'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'}

Is there a library to do this, or something I could look at for inspiration?

I'm doing this on Google App Engine, and I'm not sure if the Pyparsing library is available, but maybe I could include it with my app if it is the best solution.

Currently I'm creating my own MyHeaderParser object and using it with reduce() on the header string. It's working, but very fragile.

Brilliant solution by nadia below:

import re

reg = re.compile('(\w+)[=] ?"?(\w+)"?')

s = """Digest
realm="stackoverflow.com", username="kixx"
"""

print str(dict(reg.findall(s)))
+1  A: 

If those components will always be there, then a regex will do the trick:

test = '''Authorization: Digest qop="chap", realm="[email protected]", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''

import re

re_auth = re.compile(r"""
    Authorization:\s*(?P<protocol>[^ ]+)\s+
    qop="(?P<qop>[^"]+)",\s+
    realm="(?P<realm>[^"]+)",\s+
    username="(?P<username>[^"]+)",\s+
    response="(?P<response>[^"]+)",\s+
    cnonce="(?P<cnonce>[^"]+)"
    """, re.VERBOSE)

m = re_auth.match(test)
print m.groupdict()

produces:

{ 'username': 'Foobear', 
  'protocol': 'Digest', 
  'qop': 'chap', 
  'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 
  'realm': '[email protected]', 
  'response': '6629fae49393a05397450978507c4ef1'
}
Ned Batchelder
This solution produces correct results as far as I've been able to see.
Kris Walker
+1  A: 

I would recommend finding a correct library for parsing http headers unfortunately I can't reacall any. :(

For a while check the snippet below (it should mostly work):

input= """
 Authorization: Digest qop="chap",
  realm="[email protected]",
  username="Foob,ear",
  response="6629fae49393a05397450978507c4ef1",
  cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""

field, sep, value = input.partition(":")
if field.endswith('Authorization'):
   protocol, sep, opts_str = value.strip().partition(" ")

   opts = {}
   for opt in opts_str.split(",\n"):
  key, value = opt.strip().split('=')
  key = key.strip(" ")
  value = value.strip(' "')
  opts[key] = value

   opts['protocol'] = protocol

   print opts
Piotr Czapla
A: 

If your response comes in a single string that that never varies and has as many lines as there are expressions to match, you can split it into an array on the newlines called authentication_array and use regexps:

pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce']
i = 0
parsed_dict = {}

for line in authentication_array:
    pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern
    match = re.search(re.compile(pattern), line)         # make the match
    if match:
        parsed_dict[match.group(1)] = match.group(2)
    i += 1
Pinochle
+3  A: 

A little regex:

import re
reg=re.compile('(\w+)[:=] ?"?(\w+)"?')

>>>dict(reg.findall(headers))

{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}
Nadia Alramli
Wow, I love Python."Authorization:" is not actually part of the header string, so I did this instead:#! /usr/bin/env pythonimport redef mymain(): reg = re.compile('(\w+)[=] ?"?(\w+)"?') s = """Digest realm="fireworksproject.com", username="kristoffer" """ print str(dict(reg.findall(s)))if __name__ == '__main__': mymain()I'm not getting the "Digest" protocol declaration, but I don't need it anyway.Essentially 3 lines of code... Brilliant!!!
Kris Walker
I think it would be more explicit to use a raw string or \\.
Bastien Léonard
A: 

Your original concept of using PyParsing would be the best approach. What you've implicitly asked for is something that requires a grammar... that is, a regular expression or simple parsing routine is always going to be brittle, and that sounds like it's something you're trying to avoid.

It appears that getting pyparsing on google app engine is easy: http://stackoverflow.com/questions/1341137/how-do-i-get-pyparsing-set-up-on-the-google-app-engine

So I'd go with that, and then implement the full HTTP authentication/authorization header support from rfc2617.

Jason R. Coombs
I decided to take this approach and tried to implement a fully-compliant parser for the Authorization header using the RFC spec. This task appears to be much more daunting than I had anticpated. Your choice of the simple regex, while not rigorously correct, is probably the best pragmatic solution. I'll report back here if I eventually get a fully-functional header parser.
Jason R. Coombs
Yeah, it would be nice to see something more rigorously correct.
Kris Walker
+3  A: 

You can also use urllib2 as CheryPy does.

here is the snippet:

input= """
 Authorization: Digest qop="chap",
  realm="[email protected]",
  username="Foobear",
  response="6629fae49393a05397450978507c4ef1",
  cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
import urllib2
field, sep, value = input.partition("Authorization: Digest ")
if value:
 items = urllib2.parse_http_list(value)
 opts = urllib2.parse_keqv_list(items)
 opts['protocol'] = 'Digest'
 print opts

it outputs:

{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': '[email protected]', 'response': '6629fae49393a05397450978507c4ef1'}
Piotr Czapla
+2  A: 

Here's my pyparsing attempt:

text = """Authorization: Digest qop="chap",
    realm="[email protected]",     
    username="Foobear",     
    response="6629fae49393a05397450978507c4ef1",     
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" """

from pyparsing import *

AUTH = Keyword("Authorization")
ident = Word(alphas,alphanums)
EQ = Suppress("=")
quotedString.setParseAction(removeQuotes)

valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
authentry = AUTH + ":" + ident("protocol") + valueDict

print authentry.parseString(text).dump()

which prints:

['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', '[email protected]'],
 ['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'], 
 ['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41
- protocol: Digest
- qop: chap
- realm: [email protected]
- response: 6629fae49393a05397450978507c4ef1
- username: Foobear

I'm not familiar with the RFC, but I hope this gets you rolling.

Paul McGuire
This solution is the use of pyparsing that I was originally thinking of, and, as far as I can tell, it produces nice results.
Kris Walker
+1  A: 

The http digest Authorization header field is a bit of an odd beast. Its format is similar to that of rfc 2616's Cache-Control and Content-Type header fields, but just different enough to be incompatible. If you're still looking for a library that's a little smarter and more readable than the regex, you might try removing the Authorization: Digest part with str.split() and parsing the rest with parse_dict_header() from Werkzeug's http module. (Werkzeug can be installed on App Engine.)

Forest
Thanks a lot. I may replace that regex with this. It seems more robust.
Kris Walker