Anyone have experience in the bioinformatics field comment on what type of programming jobs are available?
So far during my coop terms (similar to paid internships), it's been database joins, queries and number crunching.
Is there more to the field than that?
...
Given these inputs:
my $init_seq = "AAAAAAAAAA" #length 10 bp
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );
I want to generate:
One thousand length-10 tags
Substitution rate for each position in a tag is 0.003
Yielding output like:
AAAAAAAAAA
AATAACAAAA
.....
AAGGAAAAGA # 1000th tags
Is there a compact ...
I'm trying to get some results from UniProt, which is a protein database (details are not important). I'm trying to use some script that translates from one kind of ID to another. I was able to do this manually on the browser, but could not do it in Python.
In http://www.uniprot.org/faq/28 there are some sample scripts. I tried the Per...
I have two following Fasta file:
file1.fasta
>0
GAATAGATGTTTCAAATGTACCAATTTCTTTCGATT
>1
GTTAAGTTATATCAAACTAAATATACATACTATAAA
>2
GGGGCTGTGGATAAAGATAATTCCGGGTTCGAATAC
file2.qual
>0
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
40 40 40 40 40 40 40 40 15 40 40
>1
40 40 40 40 40 40 40 40 40 40 40 40 40 40 40...
In a paper about the Life Science Identifiers (see LSID Tester, a tool for testing Life Science Identifier resolution services), Dr Roderic DM Page wrote :
Given the LSID urn:lsid**:ubio.org**:namebank:11815, querying the DNS for the SRV record for *_lsid.tcp.ubio.org returns animalia.ubio.org:80 as the location of the ubio.org LSID se...
What is the best choice of operating system for bioinformatics work? Are most of the tools for 64-bit Windows, for Linux/Unix in general, or OS X?
...
The following script is for finding one motif in protein sequence.
use strict;
use warnings;
my @file_data=();
my $protein_seq='';
my $h= '[VLIM]';
my $s= '[AG]';
my $x= '[ARNDCEQGHILKMFPSTWYV]';
my $regexp = "($h){4}D($x){4}D"; #motif to be searched is hhhhDxxxxD
my @locations=();
@file_data= get_file_data("seq.txt");
$protein_se...
I'm keen on learning about bioinformatics.
I am ideally looking for a short course introduction, with some practical tasks I can get my teeth into immediately to see if there is any interest in it for me.
I already have a good understanding of molecular biology, so I should be able to skip most of the foundational work.
Any suggestion...
In my Copious Free Time, I collaborate with a number of scientists (mostly biologists) who develop software, databases, and other tools related to the work they do. Generally these projects are built on a one-off basis, used in-house, and eventually someone decides "oh, this could be useful to other people," so they release a binary or s...
I am trying to find corresponding keys in two different dictionaries. Each has about 600k entries.
Say for example:
myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' }
myNames = { 'Actinobacter': '8924342' }
I want to print out the value for Actinobacter (8924342) since it matches a value in myRDP.
T...
I'm working on a small application and thinking about integrating BLAST or other local alignment searches into my application. My searching has only brought up programs, which need to be installed and called as an external program.
Is there a way short of me implementing it from scratch? Any pre-made library perhaps?
...
I need a bit of help with is this code. I know the sections that should be recursive, or at least I think I do but am not sure how to implement it. I am trying to implement a path finding program from an alignment matrix that will find multiple routes back to the zero value. For example if you excute my code and insert CGCA as the first ...
Task:
to cluster a large pool of short DNA fragments in classes that share common sub-sequence-patterns and find the consensus sequence of each class.
Pool: ca. 300 sequence fragments
8 - 20 letters per fragment
4 possible letters: a,g,t,c
each fragment is structured in three regions:
5 generic letters
8 or more positions of g's...
Sorry about the odd title.
I am using eSearch & eSummary to go from
Accession Number --> gID --> TaxID
Assume that 'accessions' is a list of 20 accession numbers (I do 20 at a time because that's the maximum that NCBI will allow).
I do:
handle = Entrez.esearch(db="nucleotide", rettype="xml", term=accessions)
record = Entrez.read(ha...
Which functional programming languages have bioinformatics libraries easily available?
(Don't include multi-paradigm languages such as Ruby)
Update: Listing which major functional programming languages don't currently have easy access to bioinformatics libraries is also welcome.
...
I have a code below that try to identify the position of start and end codon of the given DNA sequences.
We define start codon as a ATG sequence and end codon as TGA,TAA,TAG sequences.
The problem I have is that the code below works only for first two sequences (DM208659 and AF038953) but not the rest.
What's wrong with my approach be...
I'm working with an output list that contains the following information:
[start position, stop position, chromosome,
[('sample name', 'sample value'),
('sample name','sample value')...]]
[[59000, 59500, chr1,
[('cn_04', '1.362352462'), ('cn_01', '1.802001235')]],
[100000, 110000, chr1,
[('cn_03', '1.88726...
I am beginning to delve deeper into Perl, but am having trouble writing "Perl-ly" code instead of writing C in Perl. How can I change the following code to use more Perl idioms, and how should I go about learning the idioms?
Just an explanation of what it is doing: This routine is part of a module that aligns DNA or amino acid sequences...
Dear all,
I'm trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/GOstatsForUnsupportedOrganisms.pdf).
However, following the instructions literally doesn't help me.
Here's the code I execute (R 2.9.2 on u...
Could you tell me how I can calculate the DNA sequences by Java using Levenshtein algorithm
...