views:

775

answers:

7

I'm writing a program to add some code to html files

I was going to use a series of indexof and loops to find what is essentially ""X (where X is the spot im looking for)

It occurred to me that there might be a more eloquent way of doing this

does anyone have any suggestions.

what it looks like currently

<body onLoad="JavaScript:top.document.title='Abraham L Barbrow'; if (self == parent) document.getElementById('divFrameset').style.display='block';">

what it should look like when im done


<body onLoad="JavaScript:top.document.title='Abraham L Barbrow'; if (self == parent) document.getElementById('divFrameset').style.display='block';">
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-9xxxxxx-1");
pageTracker._trackPageview();
} catch(err) {}</script>
+4  A: 

I would recommend to use HtmlAgilityPack to parse the html into DOM and work with it.

modosansreves
I think this may be like using a cannon to swat a fly. The fact that its html is trival i guess. if i were going to do a search in windows it would just be indexof("<body*>"); but index of does not support wild cards
Crash893
DOM manipulation really is the safest way of doing this, crash893. Using string manipulation on the HTML serialization of an object is like using a hammer to draft blueprints.
Daniel Papasian
+1 for analogy battle
Janie
I can do it in 17 lines without DOM, It might be safer but its like driving a tank to church i don't need to be that safe
Crash893
The start and end tags for the body element are optional in HTML, so unless you want to deal with the subset of pages where the tags are explicitly included … just use a proper parser!
David Dorward
+1  A: 

You might want to look at using the Html Agility Pack

http://www.codeplex.com/htmlagilitypack

BigBlondeViking
you beat me to it..
BigBlondeViking
+2  A: 

If the HTML files are valid XHTML you could always use the XmlDocument class to interpret it. You could then easily look for the body element and append a child element to it. This would place the element right before the closing </body> tag.

Joshua
+5  A: 

I'm not sure I'm understanding you, but do you mean this?

// Given an HTML document in "htmlDocument", and new content in "newContent"
string newHtmlDocument = htmlDocument.Replace("</body>", newContent+"</body>");

And it's probably obvious I don't know c#... You'd probably want to make the "body" tag case insensitive via regexps.

markwatson
Or you could use an entire DOM parser like some of the other answers. But why load extra crap if this is all you have to do?
markwatson
thats exactly what i want to do but instead of add it before the end body tag </body> i want to add it after the start tag<body>the problem i have is the body tag isnt always just <body> it can have attributes
Crash893
In that case you may want to use one of the DOM parser solutions. You can match the opening body tag with a regex, but it doesn't work perfectly in all cases. That's been my experience anyway...
markwatson
I think all i really need to do is find "<body" then walk the pointer till i see a ">" but i don't know how to do that
Crash893
+1  A: 

I'm not sure whether the example content you want to add after the tag is the correct one or not, but if it is, I'm seeing two problems:

  1. The Google Analytics code should be added just before the end tag, not the opening tag. That ensures that you don't have to wait for it to load before loading your own code.
  2. If you're adding some other javascript, why not add that in an external file, and execute that one onload instead?

Hope that's of some help :)

Emil Stenström
A: 

This is what i got

feel free to make suggestions

 private void button1_Click(object sender, EventArgs e)
        {
            OpenFileDialog OFD = new OpenFileDialog();
            OFD.Multiselect = true;
            OFD.Filter = "HTML Files (*.htm*)|*.HTM*|" +
          "All files (*.*)|*.*";

            if (OFD.ShowDialog() == DialogResult.OK)
            {
                foreach (string s in OFD.FileNames)
                {
                    Console.WriteLine(s);
                    AddAnalytics(s);
                }
                MessageBox.Show("done!");
            }
        }
        private void AddAnalytics(string filename)
        {

            string Htmlcode = "";
            using (StreamReader sr = new StreamReader(filename))
            {
                Htmlcode = sr.ReadToEnd();
            }
            if (!Htmlcode.Contains(textBox1.Text))
            {
                Htmlcode = Htmlcode.Replace("</body>", CreateCode(textBox1.Text) + "</body>");

                using (StreamWriter sw = new StreamWriter(filename))
                {
                    sw.Write(Htmlcode);
                }
            }
        }

        private string CreateCode(string Number)
        {
            StringBuilder sb = new StringBuilder();
            sb.AppendLine();
            sb.AppendLine("<script type=\"text/javascript\">");
            sb.AppendLine("var gaJsHost = ((\"https:\" == document.location.protocol) ? \"https://ssl.\" : \"http://www.\");");
            sb.AppendLine("document.write(unescape(\"%3Cscript src='\" + gaJsHost + \"google-analytics.com/ga.js' ");
            sb.AppendLine("<//script>");
            sb.AppendLine("<script type=/\"text//javascript/\">");
            sb.AppendLine("try {");
            sb.AppendLine(string.Format("var pageTracker = _gat._getTracker(/\"{0}/\");", Number));///"UA-9909000-1"
            sb.AppendLine("pageTracker._trackPageview();");
            sb.AppendLine("} catch(err) {}<//script>");
            sb.AppendLine();
            return sb.ToString();
        }
    }
Crash893
+2  A: 
public string AddImageLink(string emailBody,string imagePath)
{
    try
    {
    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(emailBody);

    HtmlNode node = doc.DocumentNode.SelectSingleNode("//body");

    // get body using xpath query ("//body")
    // create the new node ..

    HtmlNodeCollection LinkNode = new HtmlNodeCollection(node);
    //

    HtmlNode linkNode = new HtmlNode(HtmlNodeType.Element,doc,0);
    linkNode.Name = "A";
    linkNode.Attributes.Add("href","www.splash-solutions.co.uk");


    HtmlNode imgNode = new HtmlNode(HtmlNodeType.Element,doc,1);
    imgNode.Name = "img";
    imgNode.Attributes.Add("src",imagePath);

    //appending the linknode with image node
    linkNode.AppendChild(imgNode);

    LinkNode.Append(linkNode);

    //appending LinkNode to the body of the html
    node.AppendChildren(LinkNode);


    StringWriter writer = new StringWriter();
    doc.Save(writer);
    emailBody = writer.ToString();
    return emailBody;
}
nemath