ansaurus

Question

Parse an HTTP request Authorization header with Python

Answer 1

+1 A:

If those components will always be there, then a regex will do the trick:

test = '''Authorization: Digest qop="chap", realm="[email protected]", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''

import re

re_auth = re.compile(r"""
    Authorization:\s*(?P<protocol>[^ ]+)\s+
    qop="(?P<qop>[^"]+)",\s+
    realm="(?P<realm>[^"]+)",\s+
    username="(?P<username>[^"]+)",\s+
    response="(?P<response>[^"]+)",\s+
    cnonce="(?P<cnonce>[^"]+)"
    """, re.VERBOSE)

m = re_auth.match(test)
print m.groupdict()

produces:

{ 'username': 'Foobear', 
  'protocol': 'Digest', 
  'qop': 'chap', 
  'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 
  'realm': '[email protected]', 
  'response': '6629fae49393a05397450978507c4ef1'
}

Ned Batchelder 2009-08-28 21:36:41

This solution produces correct results as far as I've been able to see.

Kris Walker 2009-09-04 11:59:19

Answer 2

+1 A:

I would recommend finding a correct library for parsing http headers unfortunately I can't reacall any. :(

For a while check the snippet below (it should mostly work):

input= """
 Authorization: Digest qop="chap",
  realm="[email protected]",
  username="Foob,ear",
  response="6629fae49393a05397450978507c4ef1",
  cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""

field, sep, value = input.partition(":")
if field.endswith('Authorization'):
   protocol, sep, opts_str = value.strip().partition(" ")

   opts = {}
   for opt in opts_str.split(",\n"):
  key, value = opt.strip().split('=')
  key = key.strip(" ")
  value = value.strip(' "')
  opts[key] = value

   opts['protocol'] = protocol

   print opts

Piotr Czapla 2009-08-28 21:38:11

Answer 3

A:

If your response comes in a single string that that never varies and has as many lines as there are expressions to match, you can split it into an array on the newlines called authentication_array and use regexps:

pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce']
i = 0
parsed_dict = {}

for line in authentication_array:
    pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern
    match = re.search(re.compile(pattern), line)         # make the match
    if match:
        parsed_dict[match.group(1)] = match.group(2)
    i += 1

Pinochle 2009-08-28 21:38:47

Answer 4

+3 A:

A little regex:

import re
reg=re.compile('(\w+)[:=] ?"?(\w+)"?')

>>>dict(reg.findall(headers))

{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}

Nadia Alramli 2009-08-28 21:40:19

Wow, I love Python."Authorization:" is not actually part of the header string, so I did this instead:#! /usr/bin/env pythonimport redef mymain(): reg = re.compile('(\w+)[=] ?"?(\w+)"?') s = """Digest realm="fireworksproject.com", username="kristoffer" """ print str(dict(reg.findall(s)))if __name__ == '__main__': mymain()I'm not getting the "Digest" protocol declaration, but I don't need it anyway.Essentially 3 lines of code... Brilliant!!!

Kris Walker 2009-08-28 21:56:59

I think it would be more explicit to use a raw string or \\.

Bastien Léonard 2009-08-28 22:04:05

Answer 5

A:

Your original concept of using PyParsing would be the best approach. What you've implicitly asked for is something that requires a grammar... that is, a regular expression or simple parsing routine is always going to be brittle, and that sounds like it's something you're trying to avoid.

It appears that getting pyparsing on google app engine is easy: http://stackoverflow.com/questions/1341137/how-do-i-get-pyparsing-set-up-on-the-google-app-engine

So I'd go with that, and then implement the full HTTP authentication/authorization header support from rfc2617.

Jason R. Coombs 2009-08-28 21:42:40

I decided to take this approach and tried to implement a fully-compliant parser for the Authorization header using the RFC spec. This task appears to be much more daunting than I had anticpated. Your choice of the simple regex, while not rigorously correct, is probably the best pragmatic solution. I'll report back here if I eventually get a fully-functional header parser.

Jason R. Coombs 2009-08-29 16:27:01

Yeah, it would be nice to see something more rigorously correct.

Kris Walker 2009-09-04 12:01:49

Answer 6

+3 A:

You can also use urllib2 as CheryPy does.

here is the snippet:

input= """
 Authorization: Digest qop="chap",
  realm="[email protected]",
  username="Foobear",
  response="6629fae49393a05397450978507c4ef1",
  cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
import urllib2
field, sep, value = input.partition("Authorization: Digest ")
if value:
 items = urllib2.parse_http_list(value)
 opts = urllib2.parse_keqv_list(items)
 opts['protocol'] = 'Digest'
 print opts

it outputs:

{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': '[email protected]', 'response': '6629fae49393a05397450978507c4ef1'}

Piotr Czapla 2009-08-28 22:11:31

Answer 7

+2 A:

Here's my pyparsing attempt:

text = """Authorization: Digest qop="chap",
    realm="[email protected]",     
    username="Foobear",     
    response="6629fae49393a05397450978507c4ef1",     
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" """

from pyparsing import *

AUTH = Keyword("Authorization")
ident = Word(alphas,alphanums)
EQ = Suppress("=")
quotedString.setParseAction(removeQuotes)

valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
authentry = AUTH + ":" + ident("protocol") + valueDict

print authentry.parseString(text).dump()

which prints:

['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', '[email protected]'],
 ['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'], 
 ['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41
- protocol: Digest
- qop: chap
- realm: [email protected]
- response: 6629fae49393a05397450978507c4ef1
- username: Foobear

I'm not familiar with the RFC, but I hope this gets you rolling.

Paul McGuire 2009-09-04 09:40:06

This solution is the use of pyparsing that I was originally thinking of, and, as far as I can tell, it produces nice results.

Kris Walker 2009-09-04 12:00:35

Answer 8

+1 A:

The http digest Authorization header field is a bit of an odd beast. Its format is similar to that of rfc 2616's Cache-Control and Content-Type header fields, but just different enough to be incompatible. If you're still looking for a library that's a little smarter and more readable than the regex, you might try removing the Authorization: Digest part with str.split() and parsing the rest with parse_dict_header() from Werkzeug's http module. (Werkzeug can be installed on App Engine.)

Forest 2010-05-14 00:13:46

Thanks a lot. I may replace that regex with this. It seems more robust.

Kris Walker 2010-05-14 18:26:46

ansaurus

tags:

views:

answers:

Parse an HTTP request Authorization header with Python

related questions