tags:

views:

49

answers:

2

Hi,

I need to parse through the aspx file (from disk, and not the one rendered on the browser) and make a list of all the server side asp.net controls present on the page, and then create an xml file from it. which would be the best way to do it? Also, are there any available libraries for this?

For eg, if my aspx file contains

<asp:label ID="lbl1" runat="server" Text="Hi"></asp:label>

my xml file would be

<controls>
<ID>lbl1</ID>
<runat>server</runat>
<Text>Hi</Text>
</controls>

A: 

ASPX files should be valid XML, so maybe XSLT would be a good solution. The W3 Schools site has a good introduction and reference. You could then call this XSLT from a simple program to pick the required file(s).

Alternatively, you could use Linq to XML to load the ASPX file(s) and iterate over the controls in a Linq-style.

Graham Clark
Thanks for replying, but do you any site with sample code / example project to start off with? i am a complete newbee in .net :)
Ubaid
+2  A: 

Xml parsers wouldn't understand the ASP directives: <%@ <%= etc.

You'll probably best to use regular expressions to do this, likely in 3 stages.

  1. Match any tag elements from the entire page.
  2. For Each tag, match the tag and control type.
  3. For Each tag that matches (2), match any attributes.

So, starting from the top, we can use the following regex:

(?<tag><[^%/](?:.*?)>)

This will match any tags that don't have <% and < / and does so lazily (we don't want greedy expressions, as we won't read the content correctly). The following could be matched:

<asp:Content ID="ph_PageContent" ContentPlaceHolderID="ph_MainContent" runat="server">
<asp:Image runat="server" />
<img src="/test.png" />

For each of those captured tags, we want to then extract the tag and type:

<(?<tag>[a-z][a-z1-9]*):(?<type>[a-z][a-z1-9]*)

Creating named capture groups makes this easier, this will allow us to easily extract the tag and type. This will only match server tags, so standard html tags will be dropped at this point.

<asp:Content ID="ph_PageContent" ContentPlaceHolderID="ph_MainContent" runat="server">

Will yield:

{ tag = "asp", type = "Content" }

With that same tag, we can then match any attributes:

(?<name>\S+)=["']?(?<value>(?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?

Which yields:

{ name = "ID", value = "ph_PageContent" },
{ name = "ContentPlaceHolderID", value = "ph_MainContent" },
{ name = "runat", value = "server" }

So putting that all together, we can create a quick function that can create an XmlDocument for us:

public XmlDocument CreateDocumentFromMarkup(string content)
{
  if (string.IsNullOrEmpty(content))
    throw new ArgumentException("'content' must have a value.", "content");

  RegexOptions options = RegexOptions.CultureInvariant | RegexOptions.Compiled | RegexOptions.IgnoreCase;
  Regex tagExpr = new Regex("(?<tag><[^%/](?:.*?)>)", options);
  Regex serverTagExpr = new Regex("<(?<tag>[a-z][a-z1-9]*):(?<type>[a-z][a-z1-9]*)", options);
  Regex attributeExpr = new Regex("(?<name>\\S+)=[\"']?(?<value>(?:.(?![\"']?\\s+(?:\\S+)=|[>\"']))+.)[\"']?", options);

  XmlDocument document = new XmlDocument();
  XmlElement root = document.CreateElement("controls");

  Func<XmlDocument, string, string, XmlElement> creator = (document, name, value) => {
    XmlElement element = document.CreateElement(name);
    element.InnerText = value;

    return element;
  };

  foreach (Match tagMatch in tagExpr.Matches(content)) {
    Match serverTagMatch = serverTagExpr.Match(tagMatch.Value);

    if (serverTagMatch.Success) {
      XmlElement controlElement = document.CreateElement("control");

      controlElement.AppendChild(
        creator(document, "tag", serverTagMatch.Groups["tag"].Value));
      controlElement.AppendChild(
        creator(document, "type", serverTagMatch.Groups["type"].Value));


      XmlElement attributeElement = document.CreateElement("attributes");

      foreach (Match attributeMatch in attributeExpr.Matches(tagMatch.Value)) {
        if (attributeMatch.Success) {
          attributeElement.AppendChild(
            creator(document, attributeMatch.Groups["name"].Value, attributeMatch.Groups["value"].Value));
        }
      }

      controlElement.AppendChild(attributeElement);
      root.AppendChild(controlElement);
    }
  }  

  return document;
}

The resultant document could look like this:

<controls>
  <control>
    <tag>asp</tag>
    <type>Content</type>
    <attributes>
      <ID>ph_PageContent</ID>
      <ContentPlaceHolderID>ph_MainContent</ContentPlaceHolderID>
      <runat>server</runat>
    </attributes>
  </control>
</controls>

Hope that helps!

Matthew Abbott
What about embedded user controls? Controls from there will be not logged by your program.+1 for solution
Yauheni Sivukha
Thanks Matthew, i'l check this out soon. thanks a million!
Ubaid
@Yauheni, for user controls, wouldn't you parse the .ascx file instead?
Matthew Abbott