views:

201

answers:

4

I have a biology database that I would like to query. There is also a given terminology bank I have access to that has formalizable predicates. I would like to build a query language for this DB using the predicates mentioned. How would you go about it? My solution is the following:

  1. formalize the predicates
  2. translate into a query language (sql, sparql, depends)
  3. Build a specific language with ANTLR or other such tools
  4. Translate from 3 to 2.

Is this a valid approach? Are there better ones? Any pointers would be much appreciated.

+1  A: 

Use BNF to get a head-start into the language semantics..GoldParser will help you by playing around with the semantics and syntax (link here: http://www.devincook.com/). Once you have the BNF semantics sorted out, you can then build up actions based on the inputs, for example, a bnf grammar section dealing with extracting a composition of a limb's genetic makeup classification (I do not know if that is in existence, abstract example here but you get the gist) for a particular query...'fetch stats on limb where limb is leg', then behind the scenes you would issue a SQL select on a column alias or name from a predefined table ... I could be wrong on the approach... Hope it helps? Tom :)

tommieb75
So you suggest I define the syntax of the dsl first, and then the rest. Maybe this is the right way to go, it will guide the rest of the efforts. Is that your take? Thanks!
Dervin Thunk
Yes, that would be my take on it. Glad to be of help! :)
tommieb75
A: 

"I have a biology database that I would like to query."

That fact, by and of itself, means that you already have a set of predicates (namely, the predicates that define the meaning of the contents of that biology database that you're dealing with).

Therefore, the step that you propose to "formalize the precicates" is superfluous, meaningless, irrelevant, ... (how you want to label it). They are already formalized by the database's mere existence.

Furthermore, any language that is based on the relational algebra should suffice to be able to query that database in any conceivable way.

Brief, I don't see what you're after.

Erwin Smout
How is this helpful? I understand the value of "you're doing it wrong" answers, but you have no idea what the 'database' mentioned is (could be a SQL database in the traditional sense, a tarball of web pages having something to do with biology or other unstructured test). Furthermore, clearly relational algebra will be at the heart of most sensible query languages, but one size does not fit all. Many languages coexist because they provide better expressability or performance for somebody, and while creating a DSL may not be necessary, there isn't enough information here to conclude that.
Matt J
I do not NEED to "have an idea what the database mentioned is" in order to know that 'being a database' DOES IMPLY 'being a set of already formalized predicates', and that therefore, there is no need what so ever to 'formalize these predicates'. Go get yourself an education before you downvote more knowledgeable people's answers.
Erwin Smout
Yikes; no need to shout :) I only mean that even if you're right, the specific information you have included in the answer thus far will not be helpful. The fact that this question exists suggests that, if a list of formalized predicates exists, the OP doesn't see it (as evidenced by step 1). No effort was made in the answer to seek more information to help the asker formalize the predicates or realize that such a list of predicates may exist already. "This is a dumb question; think harder" may well be true, but doesn't help anybody. Neither do appeals to authority.
Matt J
@Erwin: "Brief, I don't see what you're after." Yes, this part is clear. You should therefore refrain from commenting. You obviously have done a bit of db theory, but maybe just stopped at functional dependencies, right? Anyway, the predicates I'm referring to are natural language words from an experts' vocabulary, this is the vocabulary (in a domain-specific query language) I would like to teach them to use, rather than SQL. Please, for next time, think the question through, and try to visualize the problem.
Dervin Thunk
+1  A: 

Take a look at Booleano.

Paul McGuire
A: 

I suggest you take a look at the i2b2 framework (www.i2b2.org), it's a graphical query language and query engine platform for patient databases.

It's probably hard to grasp all first but do take a look at the CRC cell or webservice in there, you'll see how they approached SQL generation from a clinical graphical query language in an interesting way (albeit, not so performance friendly :))

wsb3383