views:

122

answers:

5

I want to store molecules in memory. These can be simple molecules:

Methane (CH4)
C-H bond-length: 108.7 pm
H-H angle: 109 degrees

But also more complex molecules, like paracetamol (C8H9NO2):
Paracetamol

How can I store molecules in memory, including all bond-lengths and angles?


A good idea to store atom-structs in an array? Or is there a better way?

+8  A: 

It looks like some kind of graph data structure:

  • A molecule has a set of atoms
  • Atoms are linked by bonds:
    • A bond can be double, single or tripple
    • A bond has a length
    • A bond has an angle
  • It's a cyclic graph (for instance, the example in the question has a ring of alternating single and double bonds)
  • It's not a directed graph (if two atoms are bonded, it doesn't matter from which end you approach the bond)

Typically you'd store a graph as an array of nodes (atoms) and an array of edges (bonds). Nodes and edges would both be pointers to structs.

A node (atom) would store the element.

An edge (bond) would store the following fields:

  • A pair of pointers to nodes
  • The type of bond (double/single)
  • The bond length and angle

Since it's not a directed graph, your data structure would consider an edge between A and B to be equivalent to an edge between B and A. That is, for a given pair of atoms, you'd expect your edge array to contain an edge from A to B and never from B to A.

Tim Robinson
Good idea indeed. I'm a hell with pointers however :'p Oh well...
Time Machine
A molecule is not a cyclic graph, it CAN be one though
rano
@rano, right; it's not an acyclic graph. Any code that operates on a molecule graph needs to cope with cycles.
Tim Robinson
`@property (nonatomic, retain) NSArray *atoms;` non-atomic atoms? hihi
Time Machine
It depends on the single molecule. How can you decide form the start if it will be cyclic or acyclic? (and for this reason you are right, you shall be able to cope with cycles, but you always will when you can't say anything more about the graph).
rano
Triple bonds? They must have invented those after I did chemistry at school.
Tim Robinson
@Tim: the suspected maximal bond order is 6; triple bonds may be uncommon in nature, but they do occur outside the lab; quick googling didn't turn up a date for the first characterization of triple bonds, but for quadruple bonds, wikipedia suggests the date 1964
Christoph
A: 

Given that you're modeling a linked structure, it seems like a linked data structure would be the most appropriate representation. I'm afraid I'm not enough of a chemist to guess at all the data that would be interesting to store for each item though.

Offhand, it sounds a lot like a very slightly modified version of how you usually store a graph though -- nodes and links, with data associated with each. In a graph, the typical data associated with a link is the "cost" of that link, but in this case it sounds like you'd store a vector (angle and a length).

Jerry Coffin
why a vector when you can have a struct? : P
rano
@rano: sorry, I was using "vector" as it would be in physics, not really as it would be in programming. Whether you'd use a struct, `vector`, or something else to store it is a completely separate question.
Jerry Coffin
@rano: vector is also a mathematical term, denoting angle and length.
Paul Nathan
@Jerry Coffin @Paul Nathan sorry, my bad I came too fast to a conclusion ^^
rano
A: 

Maybe you should have a look at Molecules by Brad Larson, a supernice open source app for iPhone/iPod/iPad that inspired already many programmers. I guess he uses a standard data format for molecules that can inspire yours

rano
A: 

Are you going to be doing any kind of computations on these molecules, or just drawing them? If you have any intention of doing any rigorous calculations, then I would not even try using Objective C. For best performance, you should store the angles, lengths, and connections in normal arrays, and do all the accounting necessary to keep track of what's what. These types of applications that use fancy data structures and such to hold the molecule info run very poorly if there is a lot of computation involved as well.

Derek
A: 

A molecule in your posting is a descriptor like template of an instance of a concrete molecule (with different cartesian coordinates). Therefore, you need only one "molecule" for every possible number of molecule-type instances. Then, you need an array of coordinates from where you can extract which coordinates give one molecule. For this single descriptor, you should create data structures that allow you to access the values in the best way possible. Maybe a vector of (a,b,r,type)-Structs for bonds, a vector of (a,b,c,w,type) for angles etc.

Regards

rbo

rubber boots