views:

434

answers:

2

What regular expression can I use (if any) to validate that a given string is a legal ssh rsa public key?

I only need to validate the actual key - I don't care about the key type the precedes it or the username comment after it.

Ideally, someone will also provide the python code to run the regex validation.

Thanks.

+1  A: 

Based on the references to "key type that precedes it" and "username comment after it", I assume you're talking about public keys stored in ssh2 keyfile format.

In that format, the key is stored in base64 format, so a simple check would be to verify that the string contains only valid base64 characters.

If you want to go a little further, you could note that the first few bytes of the encoded key specify the key type, and match on that. See this post, which says:

If you base64-decode the first bit of that text (AAAAB3NzaC1yc2EA) you'll find that it starts with bytes 00 00 00 07 (indicating that a 7-character string follows) and then the seven characters "ssh-rsa", which is the key type. DSA keys start with the slightly different string `AAAAB3NzaC1kc3MA', which decodes similarly to the string "ssh-dss".

David Gelhar
the "type key comment" layout is the openssh format. The SSH2 format is described in rfc4716
JimB
+2  A: 

A "good enough" check is to see if the key starts with the correct header.

The data portion of the keyfile should decode from base64, or it will fail with a base64.binascii.Error

Unpack the first 4 bytes (an int), which should be 7. This is the length of the following string (I guess this could be different, but you're only concerned with ssh-rsa).

openssh_pubkey = open('keyfile').read()
type, key_string, comment = openssh_pubkey.split()
data = base64.decodestring(key_string)
int_len = 4
str_len = struct.unpack('>I', data[:int_len])[0] # this should return 7
data[int_len:int_len+str_len] == type

Alternatively, you could forgo the binary checks, and look for AAAAB3NzaC1yc2EA at the start of an ssh-rsa key, bit I would still verify it's valid base64.

[edit] Clarification:
Via the specification, the first part if the key is a length prefixed string. The length is packed as a big-endian unsigned int ('>I' for a python struct). It's a 7 here, because the following string, 'ssh-rsa', is 7 bytes long. data[4:11] is the next 7 bytes (per the length prefix), but I edited the code above to use some descriptive variables to try and make this more clear. If you want to be thorough, you should also check for ssh-dss, and possibly pgp-sign-rsa, and pgp-sign-dss, but they are far less common.

JimB
If I understand your code correctly, you're checking that 'key_string' is a base64 decodable sequence and then make sure it begins with 7 because all rsa pubkeys start with 7?What does`data[4:11] == type` mean?
Warlax