views:

72

answers:

3

Hi there.

I am working on an application that gets text from a text file on a page. Example link: http://test.com/textfile.txt

This text file contains the following text:

1 Milk Stuff1.rar
2 Milk Stuff2.rar
3 Milk Stuff2-1.rar
4 Union Stuff3.rar

What I am trying to do is as follows, to remove everything from each line, except for "words" that start with 'Stuff' and ends with '.rar'.

The problem is, most of the simple solutions like using .Remove, .Split or .Replace end up failing. This is because, for example, formatting the string using spaces ends up returning this:

1
Milk
Stuff1.rar\n2
Milk
Stuff2.rar\n3
Milk
Stuff2-1.rar\n4
Union
Stuff3.rar\n

I bet it's not as hard as it looks, but I'd apreciate any help you can give me.

Ps: Just to be clear, this is what I want it to return:

Stuff1.rar
Stuff2.rar
Stuff2-1.rar
Stuff3.rar

I am currently working with this code:

            client.HeadOnly = true;
            string uri = "http://test.com/textfile.txt"; 

            byte[] body = client.DownloadData(uri);
            string type = client.ResponseHeaders["content-type"]; 
            client.HeadOnly = false; 

            if (type.StartsWith(@"text/")) 
            {
                string[] text = client.DownloadString(uri);

                foreach (string word in text)
                {
                    if (word.StartsWith("Patch") && word.EndsWith(".rar"))
                    {
                        listBox1.Items.Add(word.ToString());
                    }
                }
            }

This is obviously not working, but you get the idea.

Thank you in advance!

+2  A: 

I would be tempted to use a regular expression for this sort of thing.

Something like

Stuff[^\s]*.rar

will pull out just the text you require.

How about a function like:

public static IEnumerable<string> GetStuff(string fileName)
{
    var regex = new Regex(@"Stuff[^\s]*.rar");
    using (var reader = new StreamReader(fileName))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            var match = regex.Match(line);
            if (match.Success)
            {
                yield return match.Value;
            }
        }
    }
}
Rob Levine
Thank you for the help, I decided to use the code above because it works and it's not that space consuming. But thanks again, I apreciate it.
Nick
No problem - the fact you often get multiple suggestions and you get to choose the most applicable is one of SO's strengths IMHO.
Rob Levine
+5  A: 

This should work:

        using (var writer = File.CreateText("output.txt"))
        {
            foreach (string line in File.ReadAllLines("input.txt"))
            {
                var match = Regex.Match(line, "Stuff.*?\\.rar");

                if (match.Success)
                    writer.WriteLine(match.Value);
            }
        }
Pieter
Thanks a bunch! I didn't know you could use wildcards in regex, which actually makes alot of sense. :D I will mark this as answer as soon as I am allowed to. Thank you for the fast response.
Nick
A: 
for(string line in text)
{
    if(line.EndsWith(".rar"))
    {
        int index = line.LastIndexOf("Stuff");
        if(index != -1)
        {
            listBox1.Items.Add(line.Substring(index));
        }
    }
}
Matthew Flaschen