tags:

views:

281

answers:

3

I have custom tag for FLASH object, which i want to include in cms content. now when i read the content, i would like to grab those custom tag and the value in between.

Custom TAG:

<myflash filename="test.swf" width="500" height="400">
  <param name="wmode" value="somevalue"></param>
  <param name="bgcolor" value="#ffffff"></param>
  <var name="id" value="testid"></var>
</myflash>

now i'll require a regular expression which will read these entire block of code from the content. there will be more than one custom tag in one single content.

can anyone help please?

Kind regards,

Vipul

+3  A: 

You can start with a very simple regex:

<myflash[^>]*>(.*?)</myflash>

Just make sure to use the "non-greedy" capture (.*?), so that the ".*" matches as little as possible.

Also, use RegexOptions.SingleLine, so that the dot matches every character, including \n:

Regex re = new Regex("<myflash[^>]*>(.*?)</myflash>", RegexOptions.SingleLine);
Ferdinand Beyer
this expression is not working, might be because it has <param></param> tags inside it..
Use RgexOptions.Multiline
majkinetor
The PARAM tags shouldn't matter. Did you use the SingleLine flag? You might want to use IgnoreCase too, if your tags don't always use lowercase names. If that doesn't work, we would need to see your code, because the regex does exactly what you asked for.
Alan Moore
@majkinetor, the Multiline flag won't change anything. It allows ^ and $ to match the beginning and end, respectively, of logical lines as well as the beginning and end of the whole string.
Alan Moore
Ye... the point was actually to see if dot operator consumes new lines. I don't know why I contected that with Multine :)
majkinetor
Note that the `>` is allowed in attribute values.
Gumbo
The single-/multiline options are not just badly named, they shouldn't exist at all. They're a Perl-historical artifact, and in Perl 6 they've finally been done away with. Who knows how long the rest of us will be stuck with them. :-/
Alan Moore
@Gumbo: No it isn't -- it must be encoded as entity (>, although browsers will tolerate it).
Ferdinand Beyer
+3  A: 

Regex is, IMO, the wrong tool for processing XML. Why not use XmlDocument or XDocument etc? If that is HTML (note no "X"), then the HTML Agility Pack may be useful.

With both XmlDocument and the HTML Agility Pack you can use xpath/xquery, so you can simply use .SelectNodes("//myflash"). XDocument has similar, but a different method: .Descendants("myFlash").

Marc Gravell
+1 No regex for markup!
Andrew Hare
-1 That isn't the answer ... You provide the answer then eventual notes. Notes without answers are no good.
majkinetor
@majkinetor - how does .SelectNodes("//myflash") not answer it? It is the work of 2 seconds to discover .InnerXml and .OuterXml, for example. The reason I didn't include this is because the route is different for each of the 3 options, and that choice depends on a: xml vs html (not specified in the question), and b: XmlDocument vs XDocument (which repends on the .NET version, not specified in the question). So go on then: how would you unambiguously answer it?
Marc Gravell
Its not becuase the man asked for RE, not XPath. Instead of speculating about methods he use (your advice is sound, thats not the problem) its better to answer the real question, then offer alternative (or semantically better) method.
majkinetor
@majkinetor - right, and if somebody asks for a hammer to put some screws in, do you hand them a hammer? Or do you tell them about screwdrivers?
Marc Gravell
I give them a hammer and tell them about screwdriver :P
majkinetor
A: 

As Marc Gravell says, regexes are not suited to parsing HTML (or XML). See Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why. You are much better off using an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples of how to use parsers in many languages (there are at least two examples using C#).

Chas. Owens