views:

28

answers:

2

In the Nucleotide BLAST search page

is there a way to obtain programatically the databases listed in the "Choose Search Set" box? Maybe in XML format? (it doesn't matter the programming language used)

Thanks in advance

+2  A: 

I don't think you can get this information threw the NCBI Web services.

Using XSLT:

<?xml version='1.0'  encoding="ISO-8859-1" ?>
<xsl:stylesheet
    xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
    version='1.0'
    >

<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="//select[@id='DATABASE']"/>
</xsl:template>


<xsl:template match="select[@id='DATABASE']">
<xsl:for-each select=".//option">
<xsl:value-of select="@value"/>
<xsl:text>  </xsl:text>
<xsl:value-of select="."/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

and xsltproc:

xsltproc --html stylesheet.xsl "http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&amp;BLAST_PROGRAMS=megaBlast&amp;PAGE_TYPE=BlastSearch&amp;SHOW_DEFAULTS=on&amp;LINK_LOC=blasthome" 2> /dev/null

returns;

dbindex/9606/ref_contig dbindex/9606/alt_contig_HuRef dbindex/9606/rna  Human genomic plus transcript (Human G+T)
dbindex/10090/alt_contig dbindex/10090/ref_contig dbindex/10090/rna     Mouse genomic plus transcript (Mouse G+T)
nr      Nucleotide collection (nr/nt)
refseq_rna      Reference mRNA sequences (refseq_rna)
refseq_genomic  Reference genomic sequences (refseq_genomic)
chromosome      NCBI Genomes (chromosome)
est     Expressed sequence tags (est)
est_others      Non-human, non-mouse ESTs (est_others)
gss     Genomic survey sequences (gss)
htgs    High throughput genomic sequences (HTGS)
pat     Patent sequences(pat)
pdb     Protein Data Bank (pdb)
alu     Human ALU repeat elements (alu_repeats)
dbsts   Sequence tagged sites (dbsts)
wgs     Whole-genome shotgun reads (wgs)
env_nt  Environmental samples (env_nt)
Pierre
+1  A: 

I'm not entirely what you intend to use this for, but the complete set of databases used by NCBI are at their FTP site: ftp://ftp.ncbi.nih.gov/blast/db/ If you're only interested in the database names, just look at the bit before the first . -- most of the databases are large enough to be segmented. In order to do a good chunk of the filtering (e.g by organism), they use alias files that restrict one or more of these larger databases by GI number.