ansaurus

Question

Extracting a string starting with x and ending with y

Answer 1

A:

Regex would work really well for this. Here's an example in C# (and Java) for Regex

Joel 2010-03-29 01:15:57

Answer 2

+1 A:

In your specific case, you could always split if by ".jpg". You will probably end up with one empty element at the end of the array, and have to append the .jpg at the end of each file if you need that. Apart from that I think it would work.

Tested the following code and it worked fine:

public void SplitTest()
{
    string test = "http://i594.photobucket.com/albums/tt27/34/444.jpghttp://i594.photobucket.com/albums/as/asfd/ghjk6.jpg";
    string[] items = test.Split(new string[] { ".jpg" }, StringSplitOptions.RemoveEmptyEntries);
}

It even get rid of the empty entry...

Wagner Silveira 2010-03-29 01:16:28

This works fine for the given example. However, it won't properly enforce the starting "http" requirement. For example add "foobar.jpg" somewhere in the input and "foobar" is in the `items` result. That's easily solvable by adding a `.Where(s => s.StartsWith("http"))` after the `Split`.

Ahmad Mageed 2010-03-29 01:33:43

Answer 3

+2 A:

    Regex RegexObj = new Regex("http://.+?\\.jpg");
Match MatchResults = RegexObj.Match(subject);
while (MatchResults.Success) {
    //Do something with it 
    MatchResults = MatchResults.NextMatch();
     }

Martin Smith 2010-03-29 01:17:35

You definitely need the + as Dan has, otherwise you'll only match a filename of zero or one character.

Ben Voigt 2010-03-29 01:26:21

Thanks for the response. It didn't work though. I think it's because you forgot the + sign as found in Dan's response (as well as two \).

DMan 2010-03-29 01:27:47

@Ben - Good Catch, @DMan - That is needed to escape the \ in C# strings. You can avoid the need to do it by putting the pattern in a string literal prefixed with the @ symbol

Martin Smith 2010-03-29 01:33:31

Sorry, I assumed that / needed to be escaped too, so I thought you escaped only the .jpg part and not the front.

DMan 2010-03-29 02:53:38

Answer 4

+4 A:

Regex is the easiest way to do this. If you're not familiar with regular expressions, you might check out Regex Buddy. It's a relatively cheap little tool that I found extremely useful when I was learning. For your particular case, a possible expression is:

(http://.+?\.jpg)

It probably requires some more refinement, as there are boundary cases that could trip this up, but it would work if the file is a simple list.

You can also do free quick testing of expressions here.

Per your latest comment, if you have links to other non-images as well, then you need to make sure it doesn't start at the http:// for one link and read all the way to the .jpg for the next image. Since URLs are not allowed to have whitespace, you can do it like this:

(http://[^\s]+\.jpg)

This basically says, "match a string starting with http:// and ending with .jpg where there is at least one character between the two and none of those characters are whitespace".

Dan Bryant 2010-03-29 01:21:07

+1 for Regex Buddy - That's where I generated the C# in my post from!

Martin Smith 2010-03-29 01:22:31

I agree that in general the x{something goes here}y kind of issue is quite easy to solve with regex - and probably the preferred approach. But his specific case, a simple split on .jpg is still the simplest way to solve it.

Wagner Silveira 2010-03-29 01:26:46

@Wagner, it depends on the actual format of the file. If it's really a simple list with no delimiters or other text, then I agree, splitting on .jpg is simpler, though also more brittle. I would prefer Regex even for the simpler case for its flexibility if the requirements change.

Dan Bryant 2010-03-29 01:33:08

I agree. Sorry I forgot to mention, but there are a lot of non-images in there too which may break it. Anyways, marked this as enter, and looking in to your recommendation with Regex Buddy!

DMan 2010-03-29 01:34:29

As per your latest edit, you are a life saver! I just ran into the exact problem of it reading from http:// to the next .jpg! I was JUST going to parse it twice (first created a regex that started with "img source", then use your original regex on that afterwards) but now I don't have to :D RegexBuddy is useful. And you too.

DMan 2010-03-29 02:52:51

Answer 5

+1 A:

The following LINQ will separate by http: and make sure to only get values that end with jpg.

 var images = from i in imageList.Split(new[] {"http:"}, 
                                     StringSplitOptions.RemoveEmptyEntries)
              where i.EndsWith(".jpg")
              select "http:" + i;

juharr 2010-03-29 01:42:15

ansaurus

tags:

views:

answers:

Extracting a string starting with x and ending with y

related questions