views:

185

answers:

4

I have the following data that looks like this for example:

34 foo
34 bar
34 qux
62 foo1
62 qux
78 qux

These are sorted based on the first column.

What I want to do is to process lines that starts with 34, but I also want the file iteration to quit after it finds no more 34s, without having have to scan through whole file. How would I do this?

The reason is because the number of lines to be processed is very large (~ 10^7). And those that start with 34 are only around 1-10% of it.

I am aware that I can grep the lines and output it into another file, but this is too tedious and creates more disk space consumption.

This code illustrates my failed attempt using "continue":

#include <iostream>
#include <vector>
#include <fstream>       
#include <sstream>       
using namespace std;     

int main () {
    string line;
    ifstream myfile ("mydata.txt");
    vector<vector<string> > dataTable;
    if (myfile.is_open())
    {
        while (! myfile.eof() )   
        {
                stringstream ss(line);    
                int FirstCol;
                string SecondCol;

                if (FirstCol != 34) {
                   continue;
                }

                // This will skip those other than 34
                // but will still iterate through all the file
                // until the end.

                // Some processing to FirstCol and SecondCol

                ss >> FirstCol >> SecondCol;
                cout << FirstCol << "\t << SecondCol << endl;   


        }
        myfile.close();
    }

    else cout << "Unable to open file"; 


    return 0;
}
+7  A: 

Use break instead of continue! continue returns to the head of the loop, only skipping the current iteration, while break leaves the loop for good.

On an unrelated note, your code has a bug that causes it to hang up if the file cannot be read for any reason (e.g. the user deletes it while your program tries to access it, the user removes the USB stick the file is on, etc.). This is because a loop condition such as:

while (!file.eof())

is dangerous! If the file stream goes into an error state, eof will never be true and the loop will go on and on and on …. You need to test whether the file is in any readable state. This is simply done by using the implicit conversion to a boolean value:

while (file)

This will cause the loop to run only as long as the file isn't finished reading and there is no error.

Konrad Rudolph
He can't just use break because he will never find the first entry where FirstCol == 34 unless the very first record happens to be 34.
John Dibling
+2  A: 

Assuming that the data in the file is sorted by the first column (as I noticed in your example), you should replace that if statement from

if (FirstCol != 34) 
{
    continue;
}

with something like:

if (FirstCol > 34) 
{
    break;
}
Cătălin Pitiș
+1  A: 

Based on the assumption that the file is sorted by FirstCol, use a state variable that indicates whether or not you have found the first one. Once you have found the first one, as soon as you find a column that is != 34, you can break out of the loop.

For example, suppose your data is now:

15 boo
32 not
34 foo
34 bar
34 qux
62 foo1
62 qux
78 qux

...this code will do what you want:

#include "stdafx.h"
#include <iostream>
#include <vector>
#include <fstream>       
#include <sstream>       
using namespace std;     

int main () {
    string line;
    ifstream myfile ("mydata.txt");
    vector<vector<string> > dataTable;
    if (myfile.is_open())
    {
     bool found34 = false;

        while ( myfile )   
        {
                stringstream ss(line);    
                int FirstCol;
                string SecondCol;
               // This will skip those other than 34
                // but will still iterate through all the file
                // until the end.

                // Some processing to FirstCol and SecondCol

                myfile >> FirstCol >> SecondCol;
                cout << FirstCol << "\t" << SecondCol << endl;   

       switch( FirstCol )
       {
       case 34 :
        found34 = true;
        cout << "Processing a 34";
        continue; // keep looping
       default :
        if( found34 )
        {
         // we found all the 34's and now we're on to the next value, so we're done
         cout << "We're done.";
         break;
        }
        else
        {
         // we haven't found the first 34 yet, so keep scanning until we do
         cout << "Keep on looking for a 34...";
         continue;
        }
       }
        }
        myfile.close();
    }

    else cout << "Unable to open file"; 


    return 0;
}
John Dibling
A switch with only 1 case and a default seems like a if-then-else to me...
Luc Touraille
John Dibling
Unfortunately, this code still has the dangerous bug that I mentioned related to the infinite loop (`while (!file.eof())`).
Konrad Rudolph
@Konrad - Fixed. Thx for pointing that out.
John Dibling
+1  A: 

Assuming line is supposed to contain input, it would be a good idea to read something into it! Change:

  while (! myfile.eof() )

to:

  while ( getline( myfile, line ) )
anon