views:

612

answers:

6

I am writing a script in Perl which searches for a motif(substring) in protein sequence(string). The motif sequence to be searched (or substring) is hhhDDDssEExD, where:

  • h is any hydrophobic amino acid
  • s is any small amino acid
  • x is any amino acid
  • h,s,x can have more than one value separately

Can more than one value be assigned to one variable? If yes, how should I do that? I want to assign a list of multiple values to a variable.

+3  A: 

It seems like you want some kind of pattern matching. This can be done with strings using regular expressions.

Reuben Peeris
@shubster: In case you are totally new to this, "regular expression" is a specific term describing a type of pattern matching language using funny symbols that's well-suited to your task. Google "regular expression tutorial".
j_random_hacker
+1  A: 

Perl Regular Expressions is what you need

dsm
+1  A: 

I am no great expert in perl, so there is quite possibly a quicker way to this, but it seems like the match operator "//" in list context is what you need. When you assign the result of a match operation to a list, the match operator takes on list context and returns a list with each of the parenthesis delimited sub-expressions. If you specify global matches with the "g" flag, it will return a list of all the matches of each sub-expression. Example:

# print a list of each match for "x" in "xxx"
@aList = ("xxx" =~ /(x)/g);
print(join(".", @aList));

Will print out

x.x.x

I'm assuming you have a regular expression for each of those 5 types h, D, s, E, and x. You didn't say whether each of these parts is a single character or multiple, so I'm going to assume they can be multiple characters. If so, your solution might be something like this:

$h = ""; # Insert regex to match "h"
$D = ""; # Insert regex to match "D"
$s = ""; # Insert regex to match "s"
$E = ""; # Insert regex to match "E"
$x = ""; # Insert regex to match "x"

$sequenceRE = "($h){3}($D){3}($s){2}($E){2}($x)($D)"

if ($line =~ /$sequenceRE/) {
    $hPart = $1;
    $sPart = $3;
    $xPart = $5;

    @hValues = ($hPart =~ /($h)/g);
    @sValues = ($sPart =~ /($s)/g);
    @xValues = ($xPart =~ /($x)/g);
}

I'm sure there is something I've missed, and there are some subtleties of perl that I have overlooked, but this should get you most of the way there. For more information, read up on perl's match operator, and regular expressions.

A. Levy
@levy I didnt understand why did you write $hPart = $1; $sPart = $3; $xPart = $5;and then equated that to @hvalues etc
shubster
I did the two step assignment-match in order to make it a little clearer what was happening. Perl can get a little cryptic if you make everything as terse as possible. I thought it would be better for the example to be readable than efficient.
A. Levy
+3  A: 

You can use character classes in your regular expression. The classes you mentioned would be:

 h -> [VLIM]
 s -> [AG]
 x -> [A-IK-NP-TV-Z]

The last one means "A to I, K to N, P to T, V to Z".

The regular expression for your example would be:

/[VLIM]{3}D{3}[AG]{2}E{2}[A-IK-NP-TV-Z]D/
Svante
A: 

I could be way off, but it sounds like you want an object with a built in method to output as a string.

If you start with a string, like the one you mentioned, you could pass the string to the class as a new object, use regular expressions like everyone has already suggested to parse out the chunks that you would then assign as variables to that object. Finally, you could have it output a string based on the variables of that object, for instance:

 $string = "COHOCOHOCOHOCOHOCOHOC";
 $sugar = new Organic($string);

 Class Organic {
 $chem;
       function __construct($chem) {
           $hydro_find = "OHO";
           $carb_find = "C";
           $this-> hydro = preg_find ($hydro_find, $chem);
           $this -> carb = preg_find ($carb_find, $chem);

        function __TO_STRING() {
           return $this->carb."="$this->hydro;
        }
   }

 echo $sugar;

Okay, that kind of fell apart in the end, and it was pseudo-php, not perl. But if I understand your question correctly, you are looking for a way to get all of the info from the string but keep it tied to that string. That would be objects and classes.

Anthony
A: 

You probably want an array (or arrayref) or a pattern (qr//).

Or maybe Quantum::Superpositions.

ysth