views:

159

answers:

4

Working on an application to parse robots.txt. I wrote myself a method that pulled the the file from a webserver, and threw the ouput into a textbox. I would like the output to display a single line of text for every line thats in the file, just as it would appear if you were looking at the robots.txt normally, however the ouput in my textbox is all of the lines of text without carriage returns or line breaks. So I thought I'd be crafty, make a string[] for all the lines, make a foreach loop and all would be well. Alas that did not work, so then I thought I would try System.Enviornment.Newline, still not working. Here's the code as it sounds now....how can I change this so I get all the individual lines of robots.txt as opposed to a bunch of text cobbled together?

public void getRobots()
{
    WebClient wClient = new WebClient();
    string url = String.Format("http://{0}/robots.txt", urlBox.Text);

    try
    {
        Stream data = wClient.OpenRead(url);
        StreamReader read = new StreamReader(data);
        string[] lines = new string[] { read.ReadToEnd() };

        foreach (string line in lines)
        {
            textBox1.AppendText(line + System.Environment.NewLine);
        }
    }
    catch (WebException ex)
    {
        MessageBox.Show(ex.Message, null, MessageBoxButtons.OK);
    }
}
+2  A: 

You need to make the textBox1 multiline. Then I think you can simply go

textBox1.Lines = lines;

but let me check that

w69rdy
It is multiline....multiline is not the issue, the Robots.txt should output with a carriage return for every Disallow: or Allow: statement, instead of all of them are appended one right after the other. The text wraps around and carriage returns into multiple lines as it fills up, but how can I get it to carriage return where I want it to?
Stev0
Have you tried simply setting textBox1.Line = lines ?
w69rdy
tried to textbox1.Lines....still giving me the same problem. the Output looks like this:Disallow:/etcDisallow:/adminDisAllow:/debugDisallow:/test, ect. I'd like to to display a single line for every Disallow or Allow statement
Stev0
+1 For apparently not the problem in this case, but could have easily been correct from the question.
Fish
+6  A: 

You are reading the entire file into the first element of the lines array:

string[] lines = new string[] {read.ReadToEnd()};

So all your loop is doing is adding the whole contents of the file into the TextBox, followed by a newline character. Replace that line with these:

string content = read.ReadToEnd();
string[] lines = content.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);

And see if that works.

Edit: an alternative and perhaps more efficient way, as per Fish's comment below about reading line by line—replace the code within the try block with this:

Stream data = wClient.OpenRead(url);
StreamReader read = new StreamReader(data);

while (read.Peek() >= 0) 
{
    textBox1.AppendText(read.ReadLine() + System.Environment.NewLine);
}
Mark B
This is making sense to me, but Split requires a char[]...how would I modify this? Should I do like a slice...negative index type thing looking for the : in every statement?
Stev0
No, `Split` will take a string array as well, as in my code: http://msdn.microsoft.com/en-us/library/tabh47cf.aspx
Mark B
The code as you've written here is throwing a bunch of exceptions at me. I understand it can take a string[], bit its complaining about char[] and converting a string to string[]
Stev0
Sorry, I missed out the second parameter. Edited, try again.
Mark B
That certainly fixed the exceptions, but the output still isn't quite what I desired. I dont think that robots.txt does the newline thing like my code is. I tried using "\n" and "\r" in place of the Enviornment.NewLine. Does robots.txt has some kind of weird file format that doesnt display newline characters in the same fashion as everything else? If you open it in a webbrowser it appears as it should, but WebClient from c# having a hard time.
Stev0
I suppose it will depend on the environment the robots.txt was created in as to what line endings it has. In this case I assume you're using Windows so `Environment.Newline` will be `\r\n`, whereas text files created on Unix systems (which some robots.txt files will have been) will have a line ending of `\n`. I've edited the code in my answer, so it should now handle either.
Mark B
A slight addition is that it would likely be more efficient, and result in cleaner code, to read the file into a collection line by line than splitting it after the fact. In this case the reader ReadLine() method will also handle the different line endings for you - allowing you to do it in a simple "while not at end of file loop read line into collection".
Fish
+1 Yes, that is true—although in the case of a robots.txt file I'm not sure the performance difference will even be noticeable and I didn't want to confuse the issue.
Mark B
Working like a charm now. Thanks guys
Stev0
+1  A: 

Try using .Read() in a while loop instead of .ReadToEnd() - I think you're just getting the entire file as one line in your lines array. Debug and check the count of lines[] to verify this.

Edit: Here's a bit of sample code. Haven't tested it, but I think it should work OK;

Stream data = wClient.OpenRead(url);
StreamReader read = new StreamReader(data);

List<string> lines = new List<string>();

string nextLine = read.ReadLine();  
while (nextLine != null)
{
    lines.Add(nextLine);
    nextLine = read.ReadLine();
}

textBox1.Lines = lines.ToArray();
C.McAtackney
+1  A: 

Try

public void getRobots()
{
    WebClient wClient = new WebClient();
    string robotText;
    string[] robotLines;
    System.Text.StringBuilder robotStringBuilder;

    robotText = wClient.DownloadString(String.Format("http://{0}/robots.txt", urlBox.Text));

    robotLines = robotText.Split(Environment.NewLine);

    robotStringBuilder = New StringBuilder();

    foreach (string line in robotLines)
    {
        robotStringBuilder.Append(line);
        robotStringBuilder.Append(Environment.NewLine);
    }

    textbox1.Text = robotStringBuilder.ToString();
}
PhilPursglove