I have these entity kinds:
- Molecule
- Atom
- MoleculeAtom
Given a list(molecule_ids)
whose lengths is in the hundreds, I need to get a dict of the form {molecule_id: list(atom_ids)}
. Likewise, given a list(atom_ids)
whose length is in the hunreds, I need to get a dict of the form {atom_id: list(molecule_ids)}
.
Both of these bulk lookups need to happen really fast. Right now I'm doing something like:
atom_ids_by_molecule_id = {}
for molecule_id in molecule_ids:
moleculeatoms = MoleculeAtom.all().filter('molecule =', db.Key.from_path('molecule', molecule_id)).fetch(1000)
atom_ids_by_molecule_id[molecule_id] = [
MoleculeAtom.atom.get_value_for_datastore(ma).id() for ma in moleculeatoms
]
Like I said, len(molecule_ids)
is in the hundreds. I need to do this kind of bulk index lookup on almost every single request, and I need it to be FAST, and right now it's too slow.
Ideas:
Will using a
Molecule.atoms
ListProperty
do what I need? Consider that I am storing additional data on the MoleculeAtom node, and remember it's equally important for me to do the lookup in the molecule->atom and atom->molecule directions.Caching? I tried memcaching lists of atom IDs keyed by molecule ID, but I have tons of atoms and molecules, and the cache can't fit it.
How about denormalizing the data by creating a new entity kind whose key name is a molecule ID and whose value is a list of atom IDs? The idea is, calling
db.get
on 500 keys is probably faster than looping through 500 fetches with filters, right?