views:

769

answers:

7

I have two questions:

1) Why is my code adding a carriage return at the beggining of the selected_line string?
2) Do you think the algorithm I'm using to return a random line from the file is good enough and won't cause any problems?

A sample file is:

line
number one
#
line number two

My code:

int main()
{
    srand(time(0));
    ifstream read("myfile.dat");
    string line;
    string selected_line;
    int nlines = 0;
    while(getline(read, line, '#')) {
        if((rand() % ++nlines) == 0)
            selected_line = line;
    }
    // this is adding a \n at the beggining of the string
    cout << selected_line << endl; 
}

Thanks in advance for your help.

EDIT: OK, what some of you suggested makes a lot of sense. The string is probably being read as "\nmystring". So I guess my question now is, how would i remove the first \n from the string?

+1  A: 

Because you don't specify \n as a delimeter.

Pukku
+1  A: 

Your "random" selection is completely wrong. In fact, it will always select the first line: rand() % 1 is always 0.

There is no way to uniformly select a random line without knowing the number of lines present.

In addition, why are you using # as a delimiter? Getline, by default, gets a line (ending with \n).

rlbond
So perhaps I should read the file twice, one to count the number of lines in the files and another one to read a random line based on the total number of lines?I am using # as a delimiter because I need to read paragraphs, not until I reach a \n.
+1  A: 

The newlines can appear from the second line that you print. This is because, the getline function halts on seeing the # character and resumes the next time it is called from where it left of i.e. a character past the # which as per your input file is a newline. Read the C FAQ 13.16 on effectively using rand().

One suggestion is to read the entire file in one go, store the lines in a vector and then output them as required.

dirkgently
Yep - when you have the lines in a vector, it will be easy to pick one at random.
Pukku
A: 

Because # is your delimeter, the \n that exists right after that delimeter will be the beginning of your next line, thus making the \n be in front of your line.

AlbertoPL
A: 

1) You're not adding a \n to selected_line. Instead, by specifying '#' you are simply not removing the extra \n characters in your file. Note that your file actually looks something like this:

line\n number one\n #\n line number two\n <\pre>

So line number two is actually "\nline number two\n".

2) No. If you want to randomly select a line then you need to determine the number of lines in your file first.

Naaff
I see. Is there an easy way to remove the "\n" from the beggining of the string then?
To remove whitespace from an ifstream (before you call getline), you can do something like this: while(isspace(read.peek())) read.ignore();
Naaff
+1  A: 

What you probably want is something like this:

std::vector<std::string> allParagraphs;
std::string currentParagraph;

while (std::getline(read, line)) {        
    if (line == "#") { // modify this condition, if needed
        // paragraph ended, store to vector
        allParagraphs.push_back(currentParagraph);
        currentParagraph = "";
    else {
        // paragraph continues...
        if (!currentParagraph.empty()) {
            currentParagraph += "\n";
        }
        currentParagraph += line;
    }          
}

// store the last paragraph, as well
// (in case it was not terminated by #)
if (!currentParagraph.empty()) {
    allParagraphs.push_back(currentParagraph);
}

// this is not extremely random, but will get you started
size_t selectedIndex = rand() % allParagraphs.size();

std::string selectedParagraph = allParagraphs[selectedIndex];

For better randomness, you could opt for this instead:

size_t selectedIndex 
    = rand() / (double) (RAND_MAX + 1) * allParagraphs.size();

This is because the least significant bits returned by rand() tend to behave not so randomly at all.

Pukku
Excellent Solution! Thank you very very much! I have learned a lot from this solution that you have posted. Thanks again!
nmuntz
You are welcome. I hope it wasn't homework :)
Pukku
no worries, I'm not a student.
A: 

You could use the substr method of the std::string class to remove the \n after you decide which line to use:

if ( line.substr(0,1) == "\n" ) { line = line.substr(1); }

As others have said, if you want to select the lines with uniform randomness, you'll need to read all the lines first and then select a line number. You could also use if (rand() % (++nlines+1)) which will select line 1 with 1/2 probability, line 2 with 1/2*1/3 probability, etc.

Plasmer