views:

289

answers:

7

For example, I have a string :

/div1/div2[/div3[/div4]]/div5/div6[/div7]

Now I want to split the content by "/" and ignore the content in the "[ ]".

The result should be:

  1. div1
  2. div2[/div3[/div4]]
  3. div5
  4. div6[/div7]

How can I get the result using regular expression? My programming language is JavaScript.

+3  A: 

You can't do this with regular expressions because it's recursive. (That answers your question, now to see if I can solve the problem elegantly...)

Edit: aem tipped me off! :D

Works as long as every [ is followed by /. It does not verify that the string is in the correct format.

string temp = text.Replace("[/", "[");
string[] elements = temp.Split('/').Select(element => element.Replace("[", "[/")).ToArray();
280Z28
you can do nested matching, see my answer.
CptSkippy
The fact that you *can* does not necessarily mean that you *should*
Chris Lutz
This will not work with "[div1/div2]" - If text.Replace uses a regex for replacement rather than just a string, you might be able to pull off the same trick with a lookback. Not being a C# programmer, I don't know how to do this, but that's just my recommendation.
Chris Lutz
+2  A: 

You can first translate the two-character sequence [/ into another character or sequence that you know won't appear in the input, then split the string on / boundaries, then re-translate the translated sequence back into [/ in the result strings. This doesn't even require regular expressions. :)

For instance, if you know that [ won't appear on its own in your input sequences, you could replace [/ with [ in the initial step.

aem
A: 

Without knowing which regex engine you are targeting i can only guess what would work for you. If you are using .Net, have a look here: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

If you're using perl, have a look here: h t t p://search.cpan.org/~abigail/Regexp-Common-2.122/lib/Regexp/Common/balanced.pm

Deliria
I use C# and Javascript.
Mike108
A: 

experimental example, using PHP and split approach, but only tested on sample string.

$str = "/div1/div2[/div3[/div4]]/div5/div6[/div7]/div8";
// split on "/"
$s = explode("/",$str);
foreach ($s as $k=>$v){
    // if no [ or ] in the item
    if( strpos($v,"[")===FALSE && strpos($v,"]") ===FALSE){
        print "\n";
        print $v."\n";
    }else{
        print $v . "/";
    }
}

output:

div1
div2[/div3[/div4]]/
div5
div6[/div7]/
div8

Note: there is "/" at the end so just a bit of trimming will get desired result.

ghostdog74
A: 

s/\/(div\d{0,}(?:\[.*?\])?)/$1\n/

beggs
That matches nothing in Dreamweaver or The Regulator
CptSkippy
He didn't specify a language... this will work in PHP and Perl. Check it at: http://regex.powertoy.org/Try `\/(div\d{0,}(?:\[.*?\])?)` in the Regulator
beggs
Since he notes that this should be C#... I tested it in http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx and `\/(div\d{0,}(?:\[.*?\])?)` works.
beggs
+2  A: 

This works...

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main(string[] args)
    {
        string testCase = "/div1/div2[/div3[/div4]]/div5/div6[/div7]";
        //string pattern = "(?<Match>/div\\d(?:\\[(?>\\[(?<null>)|\\](?<-null>)|.?)*(?(null)(?!))\\])?)";
        string pattern = "(?<Match>div\\d(?:\\[(?>\\[(?<null>)|\\](?<-null>)|.?)*(?(null)(?!))\\])?)";

        Regex rx = new Regex(pattern);

        MatchCollection matches = rx.Matches(testCase);

        foreach (Match match in matches)
          Console.WriteLine(match.Value);

        Console.ReadLine();

    }
}

Courtesy of... http://retkomma.wordpress.com/2007/10/30/nested-regular-expressions-explained/

CptSkippy
+1  A: 

Judging by your posting history, I'll guess you're talking about C# (.NET) regexes. In that case, this should work:

Regex.Split(target, @"(?<!\[)/");

This assumes every non-delimiter / is immediately preceded by a left square bracket, as in your sample data.

You should always specify which regex flavor you're working with. This technique, for example, requires a flavor that supports lookbehinds. Off the top of my head, that includes Perl, PHP, Python and Java, but not JavaScript.

EDIT: Here's a demonstration in Java:

public class Test
{
  public static void main(String[] args)
  {
    String str = "/div1/div2[/div3[/div4]]/div5/div6[/div7]";

    String[] parts = str.split("(?<!\\[)/");
    for (String s : parts)
    {
      System.out.println(s);
    }
  }
}

output:

div1
div2[/div3[/div4]]
div5
div6[/div7]

Of course, I'm relying on some simplifying assumptions here. I trust you'll let me know if any of my assumptions are wrong, Mike. :)

EDIT: Still waiting on a ruling from Mike about the assumptions, but Chris Lutz brought up a good point in his comment to 280Z28. At the root level in the sample string, there are two places where you see two contiguous /divN tokens, but at every other level the tokens are always isolated from each other by square brackets. My solution, like 280Z28's, assumes that will always be true, but what if the data looked like this?

/div1/div2[/div3/div8[/div4]/div9]/div5/div6[/div7]

Now we've got two places where a non-delimiter slash is not preceded by a left square bracket, but the basic idea is. Starting from any point the root level, if you scan forward looking for square brackets, the first one you find will always be a left (or opening) bracket. If you scan backward, you'll always find a right (or closing) bracket first. If both of those conditions are not true, you're not at the root level. Translating that to lookarounds, you get this:

/(?![^\[\]]*\])(?<!\[[^\[\]]*)

I know it's getting pretty gnarly, but I'll this take over that godawful recursion stuff any day of the week. ;) Another nice thing is that you don't have to know anything about the tokens except that they start with slashes and don't contain any square brackets. By the way, this regex contains a lookbehind that can match any number of characters; the list of regex flavors that support that is very short indeed, but .NET can do it.

Alan Moore
That expression doesn't completely fulfill the request. With extension (e.g. @"(?<!\[)/div\d") it will capture the root elements but not the nesting.
CptSkippy