views:

1010

answers:

5

Dear all,

The following codes try to generate random strings over K runs. But we want the newly generated strings to be totally different with its reference string.

For that I tried to use "continue" to restart the random string generation process. However it doesn't seem to work. What's wrong with my approach below?

#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <time.h>
using namespace std;


// In this code we want to print new string that is entirely different with  
// with those in initVector 


template <typename T> void  prn_vec(std::vector < T >&arg, string sep="")
{   // simple function for printing vector
    for (int n = 0; n < arg.size(); n++) {
        cout << arg[n] << sep; 
    }
}


int main  ( int arg_count, char *arg_vec[] ) {

    // This is reference string
    vector <string> initVec;
    initVec.push_back("A");
    initVec.push_back("A");
    initVec.push_back("A");
    initVec.push_back("A");

    vector <string> DNA;
      DNA.push_back("A");
      DNA.push_back("C");
      DNA.push_back("G");
      DNA.push_back("T");

    for (unsigned i =0; i< 10000; i++) {

       vector <string> newString;
       for(unsigned j=0; j<initVec.size(); j++) {

         int dnaNo = rand() % 4;
         string newBase = DNA[dnaNo];
         string oldBase = initVec[j];

         int sameCount = 0;
      if (newBase == oldBase) {
            sameCount++;
      }

         if (sameCount == initVec.size()) {
              continue;
         }

         newString.push_back(newBase);

       } 
       cout << "Run " << i << " : ";
       prn_vec<string>(newString);
       cout << endl;

    }

    return 0;
}
+1  A: 

continue does not skip the incrementing part of the for loop. All it does is go directly to it, skipping the rest of the body of the loop.

for(int i = 0; i < 10; i++)
{
  if(i == 3)
    continue;
  printf("%d ", i);
}

Is equivalent to:

int i = 0;
while(i < 10)
{
  if(i == 3)
    goto increment;
  printf("%d ", i);
increment:
  i++;
}

No backslash in the printf() since I couldn't figure out how to make the text editor let me type one. :)

unwind
sameCount == initVec.size() is never satisfied since it is always reinitialized to 0 for every character added to the newString. So no continue comes into effect.
dirkgently
+4  A: 

Your code looks fine on first glance, unless I am missing a big part of your requirements. Read this before you use rand(). Except of course, the continue part. What you are trying to do is see if this is the same as the initVector or not, right? A simple comparison would do before you push it in or print to the console.

int sameCount = 0;
if (newBase == oldBase) {
 sameCount++;
}
// sameCount can be 1 at most, 0 otherwise
// this check never return true
if (sameCount == initVec.size()) {
continue;
}

The sameCount variable is initialized each time you create a new entry to the newString and goes out of scope at the closing } of the for loop. So, it will not be incremented to function as a proper check against duplicate generation. You should ideally, use a std::set and keep inserting in it. Duplicates are not allowed and you are saved from a lot of trouble.

More on using rand() srand() and random number generation:

From the comp.lang.c FAQ:

[...]the low-order bits of many random number generators are distressingly non-random

If you want to keep your randome numbers in the range

[0, 1, ... N - 1]

a better method compared to the simple rand() % N (as advised in the link) is to use the following:

(int)((double)rand() / ((double)RAND_MAX + 1) * N)

Now, if you were to run your program, every time you will get the same set of 10000 odd random DNA strands. Turns out this is because:

It's a characteristic of most pseudo-random number generators (and a defined property of the C library rand) that they always start with the same number and go through the same sequence.

from another FAQ of comp.lang.c.

To get different strands across runs try the following:

#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <ctime>
#include <cstdlib>
using namespace std;
    int main  ( int arg_count, char *arg_vec[] ) {

 // most pseudo-random number generators 
 // always start with the same number and 
 // go through the same sequence. 
 // coax it to do something different!
 srand((unsigned int)time((time_t *)NULL));

 // This is reference string
 string initVec("AAAA");    
 // the family
 string DNA("ACGT");

 for (unsigned i =0; i< 5; i++) {
    string newString;
    for(unsigned j=0; j<initVec.size(); j++) {
   int dnaNo = (int)((double)rand() / ((double)RAND_MAX + 1) * 4);
   char newBase = DNA[dnaNo];   
   newString += newBase;
    }
               // ideally push in a std::set 
               // for now keep displaying everything
         if (newString != initVec) {
               cout << "Run " << i << " : " << newString << endl; 
            }
         }
     return 0;
}
dirkgently
The fail is to do with the continue; statement, when it continues, j is incremented to be j = initvec.size(); at the start of inner loop, - the exit condition for the inner loop is not reached.
NotJarvis
The inner loop exits when j == 4.
dirkgently
Correct - I was being a fool, but the newstring is still outputted to the console regardless of whether the newstring matches the reference one or not.
NotJarvis
read my entire answer and the comments
dirkgently
much better on last edit, can follow it now ;-) Upvoting
NotJarvis
thanks! pick faults, have fun. ;)
dirkgently
+1  A: 

dirkgentlys answer is pretty comprehensive for what I was trying to say now.

I'd like to recommend you don't use continue though, Most coding standards recommend against using continue for good reason as it makes flow control harder to follow.

NotJarvis
+2  A: 

Your algorithm is bogus. Whatever you are trying to do, you aren't doing it, and because there's not a single comment in there, I can't really tell where you went wrong.

Your inner loop:

for each element of initVec (4)
    create a random element
    set sameCount to 0
    if random element == current element of initVec, set sameCount to 1
    if sameCount == 4, do something (pointless as this never happens)
    add random element to newString

Adding to that, your "newString" isn't a string at all, but a vector of strings.

So, your problem isn't even the use of continue, it's that your algorithm is FUBAR.

DevSolar
Looks like a C++ newbie. These things happen. You can tone that down a bit :)
dirkgently
Well, I flamed his algorithm, not him. ;-) I guess I have seen code like this from alleged "professionals" a few times too often... I wanted to point out some techniques on how to use `continue` correctly, or avoid it outright, but I can't with that algorithm... :-/
DevSolar
+1  A: 

Have you realized that sameCount never becomes more than 1? Since initVec.size() is greater than 1 execution never hits continue.

int sameCount = 0;
    //sameCount is 0
    if (newBase == oldBase) { // if it is true sameCount is 1
        sameCount++;
    }
    // sameCount is 1 or 0
    if (sameCount == initVec.size()) { //this expression is always false if initVec longer than 1
        continue;
    }

As others already said it is difficult to find out what was your intention with this code. Could you tell us please how do you mean "totally different" for example?

Luppy
dirkgently already pointed this out.
NotJarvis