tags:

views:

39

answers:

2

I can't figure out a regex that will grab every word besides MD5 hashes. - I'm using [a-zA-Z0-9]+ to match every word. How do I augment that so that it ignores something I'm thinking is like [a-fA-F0-9]{32} which would match the MD5 hashes. My question regards Regex.

8e85d8b3be426bc8d370facdb0ad3ad0
string
stringString
63994b32affec18c2a428cdfcb0e2823
stringSTRINGSTING333
34563994b32dddddddaffec18c2a
stringSTRINGSTINGsrting

Thanks for any help. :)

A: 

as already said, just grab all words which do not match to be MD5 hashes.
(first, you have to split the string)

var s = "8e85d8b3be426bc8d370facdb0ad3ad0\nstring\nstringString\n63994b32affec18c2a428cdfcb0e2823\nstringSTRINGSTING333\n34563994b32dddddddaffec18c2a\nstringSTRINGSTINGsrting";

words = [];
words_all = s.split(/\s+/);
for (i in words_all) {
  word = words_all[i];
  if (! word.match(/^[a-fA-F0-9]{32}$/)) { words.push(word) }
}
// words = ["string", "stringString", "stringSTRINGSTING333", "34563994b32dddddddaffec18c2a", "stringSTRINGSTINGsrting"]

(assuming, according to your original code, you want to use javascript)

mykhal
+1  A: 

This kind of thing is usually done with a negative lookahead:

/\b(?![0-9a-f]{32}\b)[A-Za-z0-9]+\b/

At the beginning of each word, (?![0-9a-fA-F]{32}\b) tries to match exactly 32 hexadecimal digits followed by a word boundary. If it succeeds, the regex fails.

Alan Moore
+1 for a pure regex solution, although [A-Za-z0-9] may need to be changed if you consider "Hello!" a word to be matched rather than just "Hello" (and not the exclamation point).
idealmachine