In the Nucleotide BLAST search page
is there a way to obtain programatically the databases listed in the "Choose Search Set" box? Maybe in XML format? (it doesn't matter the programming language used)
Thanks in advance
In the Nucleotide BLAST search page
is there a way to obtain programatically the databases listed in the "Choose Search Set" box? Maybe in XML format? (it doesn't matter the programming language used)
Thanks in advance
I don't think you can get this information threw the NCBI Web services.
Using XSLT:
<?xml version='1.0' encoding="ISO-8859-1" ?>
<xsl:stylesheet
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
version='1.0'
>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="//select[@id='DATABASE']"/>
</xsl:template>
<xsl:template match="select[@id='DATABASE']">
<xsl:for-each select=".//option">
<xsl:value-of select="@value"/>
<xsl:text> </xsl:text>
<xsl:value-of select="."/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
and xsltproc:
xsltproc --html stylesheet.xsl "http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome" 2> /dev/null
returns;
dbindex/9606/ref_contig dbindex/9606/alt_contig_HuRef dbindex/9606/rna Human genomic plus transcript (Human G+T)
dbindex/10090/alt_contig dbindex/10090/ref_contig dbindex/10090/rna Mouse genomic plus transcript (Mouse G+T)
nr Nucleotide collection (nr/nt)
refseq_rna Reference mRNA sequences (refseq_rna)
refseq_genomic Reference genomic sequences (refseq_genomic)
chromosome NCBI Genomes (chromosome)
est Expressed sequence tags (est)
est_others Non-human, non-mouse ESTs (est_others)
gss Genomic survey sequences (gss)
htgs High throughput genomic sequences (HTGS)
pat Patent sequences(pat)
pdb Protein Data Bank (pdb)
alu Human ALU repeat elements (alu_repeats)
dbsts Sequence tagged sites (dbsts)
wgs Whole-genome shotgun reads (wgs)
env_nt Environmental samples (env_nt)
I'm not entirely what you intend to use this for, but the complete set of databases used by NCBI are at their FTP site: ftp://ftp.ncbi.nih.gov/blast/db/ If you're only interested in the database names, just look at the bit before the first . -- most of the databases are large enough to be segmented. In order to do a good chunk of the filtering (e.g by organism), they use alias files that restrict one or more of these larger databases by GI number.