This is going to be a "long one". I'm including as much code and explanation as possible ... I'm not above throwing out code if necessary.
I'm trying to implement a logical parser in a django query system. Where users can provide complex queries against tags that are applied to samples. This is essentially part of a scientific sample repository where users can apply defined tags (tissue-type, disease studied, etc.). They can then create persistent "baskets" of samples defined by a logical query on these tags.
#models.py
class Sample(models.Model):
name = models.CharField(max_length = 255)
class Tag(models.Model):
name = models.CharField(max_length = 255)
samples = models.ManyToManyField(Sample)
A quick example:
#example data:
Sample1 has TagA, TagB, TagC
Sample2 has TagB, TagC, TagD
Sample3 has TagA, TagC, TagD
Sample4 has TagB
#example query:
'TagB AND TagC AND NOT TagD'
would return Sample1. I use a crazy string-eval hack to create a set of Q()
objects:
def MakeQObject(expression):
"""
Takes an expression and uses a crazy string-eval hack to make the qobjects.
"""
log_set = {'AND':'&','OR':'|','NOT':'~'}
exp_f = []
parts = expression.split()
#if there is a ) or ( then we can't use this shortcut
if '(' in parts or ')' in parts:
return None
for exp in parts:
if exp in log_set:
exp_f.append(log_set[exp])
else:
exp_f.append("Q(tags__name__iexact = '%s')" % exp)
st = ' '.join(exp_f)
qobj = eval(st)
return qobj
However, this fails on anything that needs a complicated order of operations or grouping by (). Given the same example data the query: (TagA OR TagB) AND NOT TagD
should return Sample1, Sample4 but does not. I've implemented a "one-at-a-time" function which can take a single Sample object and perform the query. However, in my actual database I have ~40,000 Samples and ~400 tags (about ~7 per Sample) and the iterative technique takes ~4 minutes to complete on all Samples. So I calculate the baskets nightly and then just freeze them during the day. I worry that as I start curating more baskets, samples and tags this will scale poorly.
Any suggestions?