views:

67

answers:

3

I am goingto be starting work soon on a new project at work. Essentially there are many chemical compounds here each has its own prefix / identifier. For example a couple of chars followed by a few ints and that sort of thing, tho they all vary.

I was wondering if there was an algorithm for matching these elements efficiently, opposed to having a massive if else.

I guess a hash map with key -> value with the key being some mask may be good but i was hoping someone could suggest something a little more sophisticated that i could use.

Because its not just for chemical compounds the number of different values it could be is huge.

Thanks

A: 

Convert your formula into a String and then use a regular expression matching, it will make your life easier and you will learn regular expressions, which is a something quite handy.

Aurélien Ribon
A: 

If you want to do it professionally, create a grammar file and generate a parser using ANTLR.

seanizer
+2  A: 

consider these facts:

1) Two molecules can have same structural identifier, caused for example by stereometry or, comparing two complex molecules (especially with many benzen rings)

2) Consider http://en.wikipedia.org/wiki/International_Chemical_Identifier. It's defining unambiguous version of molecule structure, and you can extract structural formula from it. For example:

1/C2H6O/c1-2-3/h3H,2H2,1H3

is representing

CH3CH2OH (ethanol)  

3) You can check MQL Molecular query language

4) Implementing it on your own may take a lot of time. There are some context-free grammars but they are very complex, try to find some free Molecule Query

dfens