views:

84

answers:

3

Hi,

This is a homework, thus I hope you guys dont give me the direct answers/code, but guide me to the solution.

My problem is, I have this XXX.html file, inside have thousands of codes. But what I need is to extract this portion:

<html>
...
<table>
    <thead>
        <tr>
            <th class="xxx">xxx</th>
            <th>xxx</th>                       <th>xxx</th>         </tr>
    </thead>
    <tbody>
        <tr class=xxx>
        <td class="xxx"><a href="xxx" >ZZZ ZZ ZZZ</a></td>
<td>ZZZZ</td>        <td class="xxx">ZZZZ</td>    </tr>    <tr class=xxx>
<td class="xxx"><a href="xxx" >ZZZ ZZ ZZZ</a></td>
<td>ZZZZ</td>        <td class="xxx">ZZZZ</td>    </tr>    <tr class=xxx>
<td class="xxxx"><a href="xxxx" >ZZZ ZZ ZZZ</a></td>
<td>ZZZZ</td>        <td class="xxxx">zzzz</td>    </tr>    <tr class=xxx>
<td class="xxx"><a href="xxxx" >ZZZ ZZ ZZZ</a></td>
    ... and so on

This is my current codes so far:

// after open the file
while(!fileOpened.eof()){
        getline(fileOpened, reader);
        if(reader.find("ZZZ")){
            cout << reader << endl;
        }
    }

EDIT:

the "reader" is a string variable that I want to hold for each line of the html file. If the value of ZZZZ, as I need to get live, the value will change, what method should I use instead of using "find" method? (I am really sorry, for not mention this part)

END OF EDIT

But instead of display the value that I want, it display the some others portion of the html file. Why? Is my method wrong? If my method is wrong, how do I extract the ZZZZZ value?

Thanks

+2  A: 

std::string::find does not return a boolean value. It returns an index into the string where the substring match occurs if it is successful, else it returns std::string::npos.

So you would want to say:

    if (reader.find("ZZZ") != std::string::npos){
        cout << reader << endl;
    }
Charles Salvia
sorry, i messup the code. I will edit my quuestion
cpp_learner
A: 

In general using string matching just won't work to extract values from an HTML file. A proper HTML parser would be required -- they are available for C++ as standard code.

Otherwise I'd suggest using a regex library (boost::regex until C++0x comes out). You'll be able to write better expressions to capture the part of the file you are interested in.

Reading by line probably won't work since an HTML file could be one large line. Outputing then each line you find will simply emit the entire file. Thus try the regexes and look for small sections of the code and output those. The regex library will have a "match all" command (I forgot the exact name).

edA-qa mort-ora-y
it looks like lots of things to study if I use the boost::regex. I am just starting to learn C++, it might take some time to implement it. Is there any shorter/easier way for beginner?
cpp_learner
the regular expression that took me weeks/months to master it =(
cpp_learner
Well, the HTML parsers are harder to use than regex. But I can say that learning regex will be well worth your time. They come up again and again and again.
edA-qa mort-ora-y
A: 

The skeleton code for reading lines from a file should look like this:

if( !file.good() )
  throw "opening file failed!";

for(;;) {
  std::string line;
  std::getline(file, line);
  if( !file.good() )
    break;
  // reading succeeded, process line
}

if(!file.eof())
  // error before reaching EOF

(That funny looking loop is one that checks for the ending condition in the middle of the loop. There is not such thing in C++, so you have to use an endless loop with a break in the middle.)

However, as I said in a comment to your question, reading HTML code line-by-line isn't necessarily useful, as HTML doesn't rely on specific whitespaces.

sbi