I've tried to understand a few examples, including questions here so I apologise if this seems to me a duplicate but I cannot find a RegularExpression I can understand.
I have some HTML to parse using an XML parser - but I want to strip out the <head> </head> tags from this content as the rest is valid enough for normal XML Parsing.
The tags <head> to </head> must be removed and their content so that the outer HTML is not affected <body> tags etc.
This is the section including the Head HTML I want removed for reference:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >
<html>
<head>
<link rel="stylesheet" type="text/css" href="/style/stylesheet.css" />
<meta name="description" content="Information" />
<base target="_top">
</head>
<body>
<!-- Body Here -->
</body>
</html>
I also need to strip the DocType, if this can be done using a RegEx then that would be great. The head is always the same - I want to remove from <head> to </head> inclusive only and if possible remove the DOCTYPE from the Text also.
Also this will need to work in Silverlight and use System.Text.RegularExpressions or similar to work.