Hi all,
I'm learning JBoss Drools and I'm playing with the genetics data from the hapmap project: ( http://hapmap.ncbi.nlm.nih.gov/genotypes/latest/forward/non-redundant/ ) . Each file in this directory is a table with the individuals at the top, the positions on the genome on the left , and the observed mutations for each individual/position.
Here I'd like to find some potential errors in the file (e.g. a children doesn't have any mutation from his parents) using Drools.
1) I want to load those data in Drools. This can be a large amount of data (e.g. genotypes_chr2_YRI_r27_nr.b36_fwd.txt.gz is 20Mo gzipped ) Will those data be stored in memory ? or does Drools stores it somewhere ? or should I use a persistence system ?
2) about the model:
I was thinking about putting the following classes in a StatefulKnowledgeSession:
class Individual
{
private String name;
//constructor, getters, setters etc...
}
class Position
{
private String name;
private String chromosome;
private int position;
//constructor, getters, setters etc...
}
class ObservedMutation
{
private String individualName;
private String positionName;
private String observed;
//constructor, getters, setters etc...
}
or should ObservedMutation be:
class ObservedMutation
{
private Individual individual;
private Position position;
private String observed;
//constructor, getters, setters etc...
}
thanks for you suggestions
Pierre
update: my firs test : http://plindenbaum.blogspot.com/2010/07/rules-engine-for-bioinformatics-playing.html