tags:

views:

68

answers:

2

I need to get parts of a string in a particular format. Tried everything from split, substring to pattern and matcher. but everytime it fails with one of the requirements.

Suppose

str = (((abc) shdj (def) iueexs (ghi)) mkek ONE(tree23) bjm
(twooo(bug OR bag)) mvnj THR-EE(<*>$##))

And terms wanted are :

"Hard Coded Term1":abc
"Hard Coded Term2":def
"Hard Coded Term3":ghi
ONE:tree23
twooo:bug,bag
THR-EE:<*>$##

Provision to hard code the terms as in the case of first three. Help!Help!

+1  A: 

You're in the neighborhood of doing language parsing. Just looking at it, it looks doable with a recursive descent parser, but with that one short example it's hard to tell for sure.

The tricky think looks to be distinguishing shdj (def) which should resuit in a "hard coded term 'def'" from ONE(tree23) which should return "ONE:tree23".

Charlie Martin
+1  A: 

Ugh, you need to first properly specify your requirements, preferably in BNF or equivalent. With that out of the way, you can find the hard coded terms via a regexp (^|[( ])[(]([^ )])[)] (use the 2nd group), and the other terms with a regexp like ([0-9a-zA-Z-_])[(]([^ )])[)] (use 1st group as name, 2nd group as value, but you will need to process further the 2nd group to split on operands).

Tassos Bassoukos