views:

357

answers:

1

Hi All,

Can somebody please help me fix my code.? I can't see where I'm going wrong. It just doesn't do what it should be doing.

It should read a file line by line (every line contains 1 url), and then foreach url in the string it will visit that url and extract title, url, and body text, and then save it to a file but it just doesn't do anything. The only error I am getting is: "Object reference not set to an instance of an object" which points to the following line of code:

u = w.Document.Body.InnerText;

Here's the full code:

    OpenFileDialog of =
        new OpenFileDialog();
    of.Title =
        "app name - Select File";
    using (of)
    {
        try
        {
            Cursor = Cursors.WaitCursor;
            if (of.ShowDialog() == DialogResult.OK)
            {
                string[] file =
                    File.ReadAllLines(
                    of.FileName);


                foreach (string line in file)
                {
                    w.Navigate(line);
                    string t,
                        d,
                        u,
                        path =
                        @"file.txt";

                        t =
                            w.DocumentTitle;
                        u =
                            w.Document.Body.InnerText;
                        d =
                            w.Url.AbsolutePath;
                        t =
                            t.Substring(0,
                            250);
                        t =
                            t.Replace(
                            "\"",
                            "\\\"");

                        a.Text += "\n" +
                            u;

                        File.AppendAllText(path,
                            "s[" +
                            an +
                            "] = \"" +
                            t +
                            "^" +
                            u +
                            "^" +
                            url1 +
                            u +
                            url2 +
                            d +
                            "\";" +
                            Environment.NewLine);
                        an++;
                }
            }
            Cursor = Cursors.Default;
        }
        catch (Exception exception)
        {
            MessageBox.Show(exception.Message);
        }
    }

I'd appreciate any suggestions/help at all and thank you :)

jase

+1  A: 

WebBrowser.Navigate is, IIRC, async. It might be better here to use WebClient.DownloadString? or HTML Agility Pack / Load?

Marc Gravell
thanks. will try html agility again. Do you know of any documentation for html agility pack? because the help file that came with one of the zips doesnt work at all
baeltazor