tags:

views:

72

answers:

3

I have a string like:

[a b="c" d="e"]Some multi line text[/a]

Now the part d="e" is optional. I want to convert such type of string into:

<a b="c" d="e">Some multi line text</a>

The values of a b and d are constant, so I don't need to catch them. I just need the values of c, e and the text between the tags and create an equivalent xml based expression. So how to do that, because there is some optional part also.

A: 

If you are actually thinking of processing (pseudo)-HTML using regexes,

don't

SO is filled with posts where regexes are proposed for HTML/XML and answers pointing out why this is a bad idea.

Suppose your multiline text ("which can be anything") contains

[a b="foo" [a b="bar"]]

a regex cannot detect this.

See the classic answer in: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

which has:

I think it's time for me to quit the post of Assistant Don't Parse HTML With Regex Officer. No matter how many times we say it, they won't stop coming every day... every hour even. It is a lost cause, which someone else can fight for a bit. So go on, parse HTML with regex, if you must. It's only broken code, not life and death. – bobince

Seriously. Find an XML or HTML DOM and populate it with your data. Then serialize it. That will take care of all the problems you don't even know you have got.

peter.murray.rust
the first case: Its not that simple [ can be in between the text also, so you do that.
Priyank Bolia
the second case: I am not parsing XML, I am converting some tags in text to XML like syntax.
Priyank Bolia
But you can have "anything" in the content. That can include punctuation characters including HTML markup , snippets of HTML, etc. You will run into the same problems as parsing XHTML or XML with regex.
peter.murray.rust
I have deleted my trivial solution as the OP now makes it clear that the regex has to deal with arbitrarily complex content.
peter.murray.rust
The other part has already been taken care of. You are again confusing, I am not parsing HTML, and there can't be HTML markup or scripts per se. Its a text that I need to convert it to XML like for e.g. convert text to SSML, and that is the reason why we don't have <a b="c" d="e"> in the first place, instead we have our own text notation to mark text part [a b="c" d="e"]
Priyank Bolia
A: 

Would some multiline text include [ and ]? If not, you can just replace [ with < and ] with > using string.replace - no need of regex.

Update: If it can be anything but [/a], you can replace

^\[a([^\]]+)](.*?)\[/a]$

with

<a$1>$2</a>

I haven't escaped ] and / in the regex - escape them if necessary to get

^\[a([^\]]+)\](.*?)\[\/a\]$
Amarghosh
multi line text can be anything.
Priyank Bolia
see my update -
Amarghosh
A: 

Hi

For HTML tags, please use HTML parser.

For [a][/a], you can do like following

Match m=Regex.Match(@"[a b=""c"" d=""e""]Some multi line text[/a]", 
                    @"\[a b=""([^""]+)"" d=""([^""]+)""\](.*?)\[/a\]",
                    RegexOptions.Multiline);

m.Groups[1].Value
"c"
m.Groups[2].Value
"e"
m.Groups[3].Value
"Some multi line text"

Here is Regex.Replace (I am not that prefer though)

string inputStr = @"[a b=""[[[[c]]]]"" d=""e[]""]Some multi line text[/a]";
string resultStr=Regex.Replace(inputStr,
                            @"\[a( b=""[^""]+"")( d=""[^""]+"")?\](.*?)\[/a\]",
                            @"<a$1$2>$3</a>", 
                            RegexOptions.Multiline);
S.Mark
First of all I am not parsing HTML, its text with some tags that need to be converted to XML. Second, is there a direct way like using Regex.Replace function
Priyank Bolia
added Regex.Replace one
S.Mark
You missed the question: the part d="e" is optional. I guess your Regex.Replace won't work.
Priyank Bolia
If $2 doesn't contain anything, and then shouldn't be d="$2" in the output.
Priyank Bolia
please take a look again, I was still editing.
S.Mark
Though the answer is not what I am looking for, as it matched everything instead of just the attribute values. I figured it out, the best is to use the Match and a MatchEvaluator Delegate in the Regex.Replace method. Accepting your answer, as it was the most helpful.
Priyank Bolia