tags:

views:

261

answers:

3

I need help with regular expression.

I have very large collection of text files with different contents. But every file contains one hexadecimal key. Every key has exactly 16 hexadecimal digits ("E4 34 F1 FB...") and always begins with "00" or "01". In some cases they have one separator character (":").

Here are some example keys:

00:C461F0538ECC84F1AF43DBBDC49E5DA3
00:F4F599D15353650F1566CFEB5CB891C1
011EC3991261BFD8D74BBCFE1E3108628C
003E05F7730347E43437F1FBCAB3A8B461
018FAE7FFB2DBB64F646F705525DEB25F8
00)339EDE5269DD018C2FD5338AD18C3A2F
00B8491FDF00C618A155350F47349E7B04

How to extract these keys from strings with a regular expression in .NET (VB.NET or C#)?

Here are a couple of strings for testing:

KAJSDF00ASLJKHFLAKJSDHFLAK01JSH00:C461F0538ECC84F1AF43DBBDC49E5DA3DFLKJAHSDFJAVHBEVBERJHVBQEJHRVBQJERBV
JKLABDVJ01KBQLKJFBVQLEJKRBVL00:F4F599D15353650F1566CFEB5CB891C1QERBVJHQEBRVJHQBERFVHBQERVJHBQEJH
RVBQJHERVBJHQBRVJHQEBRVJHWEBRV011EC3991261BFD8D74BBCFE1E3108628CWKBERVJHWERFGUQHERULIFHQW
EIFH2FPO00I134FWFQWHEF34HFQREW018FAE7FFB2DBB64F646F705525DEB25F8F2347YQ3EFQO84R93U48UY8RTU13
R1R0100910R14UYR891UYFR1UEF98U1FPH00)339EDE5269DD018C2FD5338AD18C3A2F138294FH190324FU134UF19834YF
+1  A: 
 string data = @"KAJSDF00ASLJKHFLAKJSDHFLAK01JSH00:C461F0538ECC84F1AF43DBBDC49E5DA3DFLKJAHSDFJAVHBEVBERJHVBQEJHRVBQJERBVJKLABDVJ01KBQLKJFBVQLEJKRBVL00:F4F599D15353650F1566CFEB5CB891C1QERBVJHQEBRVJHQBERFVHBQERVJHBQEJHRVBQJHERVBJHQBRVJHQEBRVJHWEBRV011EC3991261BFD8D74BBCFE1E3108628CWKBERVJHWERFGUQHERULIFHQWEIFH2FPO00I134FWFQWHEF34HFQREW018FAE7FFB2DBB64F646F705525DEB25F8F2347YQ3EFQO84R93U48UY8RTU13R1R0100910R14UYR891UYFR1UEF98U1FPH00)339EDE5269DD018C2FD5338AD18C3A2F138294F";
 for (Match match = Regex.Match(data, "0[01]:?[0-9A-F]{16}"); match.Success; match = match.NextMatch()) {
  Console.WriteLine(match.Value);
 }
Lucero
Hmm, 2 + 16 != 16
leppie
+2  A: 
0[01][\:\(\)]?([0-9A-F]){16}

Don't have a regex parser to test this but that should search for a

  • 0,
  • followed by a 0 or 1,
  • followed by the possible (but not compulsory) occurrence of a ':', '(' or ')'
  • followed by 16 consecutive characters with possible values of (0123456789ABCDEF)
Eoin Campbell
Why all the escaping backslashes in the character class? None of them is actually necessary. The parentheses around [0-9A-F] are not needed either. :)
Tomalak
This regex is over complicated and not optimised, important if the files are large. The only valid character after 00 or 01 is : so no need for the extra matches. You don't need the group around [0-9A-F]. This also does not match lower case hex. Each [0-9A-F character will be in it's own sub group.
Stevo3000
@Tomalak is right, but nevertheless, a good answer. Though you should change the count of 16 to 32. Here's my version: "0[0|1][:()]?(?<Key>[A-F\d]{32})".
Cerebrus
@Cerebrus: You don't need to use the pipe symbol in a character class (strictly speaking: its wrong to do so).
Tomalak
+3  A: 

The following regex will match your keys and be case insensitive

(?:00|01):?[a-fA-F0-9]{32}

This is assuming that the OP meant a 32 charachter string. If it is meant to be a 16 character string then change {32} to {16}.

Stevo3000
You're right, so why didn't you change the 16 to 32 ? Another thing, I think all of us thought that the OP meant that brackets are valid separators. ;-) +1
Cerebrus
@Cerebrus - Cheers. Changed to 32 character. The seperator wording wasn't overly clear.
Stevo3000