ansaurus

Question

Answer 1

+1 A:

You could try looking at the count of images for every node.

    public static XmlNode FindNodeWithMostImages(XmlNodeList

nodes) {

        var greatestImageCount = 0;
        XmlNode nodeWithMostImages = null;

        foreach (XmlNode node in nodes)
        {
            var currentNode = node;
            var currentNodeImageCount = node.SelectNodes("*/child::img").Count;

            if (currentNodeImageCount > greatestImageCount)
            {
                greatestImageCount = currentNodeImageCount;
                nodeWithMostImages = node;
            }
        }

        return nodeWithMostImages;
    }

Jason Rowe 2010-01-03 23:09:31

I guess that's the only way, huh? Little more elegant with LINQ I think, but I guess that's on the right track.

Mark 2010-01-03 23:30:36

Awhile back I did look into recursive LINQ and found this extension. You might be able to do something like this example: http://codepaste.net/gf3q5a

Jason Rowe 2010-01-04 00:10:47

Answer 2

+1 A:

XPATH 1.0 does not provide the ability to sort a collection. You will need to leverage XPATH with something else.

Here is an example XSLT solution that will find all elements that contain descendant <img> elements, and then sorts them by the count of their descendant <img> elements in descending order.

    <xsl:template match="/">
        <!--if only want <a>, then select //a[descendant::img] -->
        <xsl:for-each select="//*[descendant::img]">
            <xsl:sort select="count(descendant::img)" order="descending" />

                <!--Example output to demonstrate what elements have been selected-->
                <xsl:value-of select="name()"/><xsl:text> has </xsl:text>
                <xsl:value-of select="count(.//img)" />  
                <xsl:text> descendant images                     
                </xsl:text>

        </xsl:for-each>

    </xsl:template>

</xsl:stylesheet>

I wasn't clear from your question and examples whether you want to find any element with descendant <img> or just <a> with descendant <img>.

If you wanted to just find <a> elements with descendant <img> elements, then adjust the XPATH in the for-each to select: //a[descendant::img]

Mads Hansen 2010-01-04 02:23:20

Oh, sorry. I made a few changes to the question and it became more and more apparent that xpath wasn't quite sufficient. I was hoping that the tags `c#` and `htmlagilitypack` would that hint that I prefer using those technologies, as that's what the rest of my app is written in. This is kind of neat though ;) Hopefully the comments below the Q clear up your other questions.

Mark 2010-01-04 02:46:52

Answer 3

A:

Current solution:

    private static int Count(HtmlNodeCollection nc) {
        return nc == null ? 0 : nc.Count;
    }

    private static void BuildList(HtmlNode node, ref List<HtmlNode> list) {
        var sortedNodes = from n in node.ChildNodes
                          orderby Count(n.SelectNodes(".//a[@href and img]")) descending
                          select n;
        foreach (var n in sortedNodes) {
            if (n.Name == "a") list.Add(n);
            else if (n.HasChildNodes) BuildList(n, ref list);
        }
    }

Example usage:

    private static void ProcessDocument(HtmlDocument doc, Uri baseUri) {
        var linkNodes = new List<HtmlNode>(100);
        BuildList(doc.DocumentNode, ref linkNodes);
        // ...

It's a bit inefficient though because it does a lot of recounting, but oh well.

Mark 2010-01-12 06:51:03

ansaurus

tags:

views:

answers:

Order nodes by most images?

related questions