views:

352

answers:

4

Though I've never heard of this but, is it possible to retrieve a node from the DOM using JS, and then find out on what line of the file that node occurred on?

I'm open to anything, alternative browsers plugins/add-ons etc...it doesn't need to be cross-browser per say.

I would assume that this would be possible somehow considering that some JS debuggers are capable of finding the line number within a script tag, but I'm not entirely sure.

+1  A: 

This can be done. Start by getting the highest node in the document like this:

var htmlNode = document.getElementsByTagName('html')[0];
var node = htmlNode;
while (node.previousSibling !== null) {
    node = node.previousSibling;
}
var firstNode = node;

(this code was tested and retrieved both the doctype node as well as comments above the html node)

Then you loop through all nodes (both siblings and children). In IE, you'll only see the elements and comments (not text nodes), so it'll be best to use FF or chrome or something (you said it wouldn't have to be cross browser).

When you get to each text node, parse it to look for carriage returns.

Gabriel McAdams
A: 

Something like this?

var wholeDocument = document.getElementsByTagName('html')[0]
var findNode = document.getElementById('whatever')
var documentUpToFindNode = wholeDocument.substr(0, wholeDocument.indexOf(findNode.outerHTML))
var nlsUpToFindNode = documentUpToFindNode.match(/\n/g).length
Ollie Saunders
`findNode.outerHTML` isn't necessarily unique to that element...
J-P
value + 1... or + x if page rendered in standards mode with a DOCTYPE and or additional comments before the HTML open tag.
scunliffe
`outerHTML` is a non-standard, IE-only property so this won't work in other browsers, and as noted by @J-P, is not guaranteed to be unique to the element.
Tim Down
A: 

You could try: -

 - start at the 'whatever' node, 
 - traverse to each previous node back to the doc begining while concatenating the html of each node, 
 - then count the new lines in your collected HTML.

Post the code once you nut it out coz thats a good question :)

ekerner
+2  A: 

Ok, forgive me for how large this is. I thought this was a very interesting question but while playing with it, I quickly realized that innerHTML and its ilk are quite unreliable wrt maintaining whitespace, comments, etc. With that in mind, I fell back to actually pulling down a full copy of the source so that I could be absolutely sure I got the full source. I then used jquery and a few (relatively small) regexes to find the location of each node. It seems to work well although I'm sure I've missed some edge cases. And, yeah, yeah, regexes and two problems, blah blah blah.

Edit: As an exercise in building jquery plugins, I've modified my code to function reasonably well as a standalone plugin with an example similar to the html found below (which I will leave here for posterity). I've tried to make the code slightly more robust (such as now handling tags inside quoted strings, such as onclick), but the biggest remaining bug is that it can't account for any modifications to the page, such as appending elements. I would need probably need to use an iframe instead of an ajax call to handle that case.

<html>
    <head id="node0">
    <!-- first comment -->
        <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"&gt;&lt;/script&gt;
        <style id="node1">
/*          div { border: 1px solid black; } */
            pre { border: 1px solid black; }
        </style>
    <!-- second comment -->
        <script>
            $(function() {

                // fetch and display source
                var source;
                $.ajax({
                    url: location.href,
                    type: 'get',
                    dataType: 'text',
                    success: function(data) {
                        source = data;


                        var lines = data.split(/\r?\n/);
                        var html = $.map(lines, function(line, i) {
                            return ['<span id="line_number_', i, '"><strong>', i, ':</strong> ', line.replace(/</g, '&lt;').replace(/>/g, '&gt;'), '</span>'].join('');
                        }).join('\n');

                        // now sanitize the raw html so you don't get false hits in code or comments
                        var inside = false;
                        var tag = '';
                        var closing = {
                            xmp: '<\\/\\s*xmp\\s*>',
                            script: '<\\/\\s*script\\s*>',
                            '!--': '-->'
                        };
                        var clean_source = $.map(lines, function(line) {
                            if (inside && line.match(closing[tag])) {
                                var re = new RegExp('.*(' + closing[tag] + ')', 'i');
                                line = line.replace(re, "$1");
                                inside = false;
                            } else if (inside) {
                                line = '';
                            }

                            if (line.match(/<(script|!--)/)) {
                                tag = RegExp.$1;
                                line = line.replace(/<(script|xmp|!--)[^>]*.*(<(\/(script|xmp)|--)?>)/i, "<$1>$2");
                                var re = new RegExp(closing[tag], 'i');
                                inside = ! (re).test(line);
                            }
                            return line;
                        });

                        // nodes we're looking for
                        var nodes = $.map([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], function(num) { return $('#node' + num) });

                        // now find each desired node in both the DOM and the source
                        var line_numbers = $.map(nodes, function(node) {
                            var tag = node.attr('tagName');
                            var tags = $(tag);
                            var index = tags.index(node) + 1;

                            var count = 0;
                            for (var i = 0; i < clean_source.length; i++) {
                                var re = new RegExp('<' + tag, 'gi');
                                var matches = clean_source[i].match(re);
                                if (matches && matches.length) {
                                    count += matches.length;
                                    if (count >= index) {
                                        console.debug(node, tag, index, count, i);
                                        return i;
                                    }
                                }
                            }


                            return count;
                        });

                        // saved till end to avoid affecting source html
                        $('#source_pretty').html(html);
                        $('#source_raw').text(source);
                        $('#source_clean').text(clean_source.join('\n'));

                        $.each(line_numbers, function() { $('#line_number_' + this).css('background-color', 'orange'); });
                    },
                });

                var false_matches = [
                    "<div>",
                    "<div>",
                    "</div>",
                    "</div>"
                ].join('');

            });
        </script>
    </head>
    <!-- third comment -->
    <body id="node2">
        <div>
            <pre id="source_pretty">
            </pre>
            <pre id="source_raw">
            </pre>
            <pre id="source_clean">
            </pre>
        </div>

        <div id="node3">
            <xmp>
                <code>
                // <xmp> is deprecated, you should put it in <code> instead
                </code>
            </xmp>
        </div>

    <!-- fourth comment -->
<div><div><div><div><div><div><span><div id="node4"><span><span><b><em>
<i><strong><pre></pre></strong></i><div><div id="node5"><div></div></div></div></em>
</b></span><span><span id="node6"></span></span></span></div></span></div></div></div></div></div></div>


        <div>
            <div>
                <div id="node7">
                    <div>
                        <div>
                            <div id="node8">
                                <span>
    <!-- fifth comment -->
                                    <div>
                                        <span>
                                            <span>
                                                <b>
                                                    <em id="node9">
                                                        <i>
                                                            <strong>
                                                                <pre>
                                                                </pre>
                                                            </strong>
                                                        </i>
                                                        <div>
                                                            <div>
                                                                <div>
                                                                </div>
                                                            </div>
                                                        </div>
                                                    </em>
                                                </b>
                                            </span>
                                            <span>
                                                <span id="node10">
                                                </span>
                                            </span>
                                        </span>
                                    </div>
                                </span>
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </body>
</html>
Rob Van Dam
Maybe you and I should keep trying this code out and tweak it until we get it right. What browser did you try this on?
leeand00
(since I didn't ask for a cross-browser solution but rather just something that worked).
leeand00
+1 for effort! :)
leeand00
Um...wait I can't guarantee that the nodes will have ids. The reason that I want to do this is to automate the removal of inline style attributes from nodes in a large number of html pages (that I have inherited). I'd also like to replace those inline attributes with a class, and add that class to a CSS file or style attribute.This is just part of that process.
leeand00
For what I plan to do over all see: http://andrew-leer.com/?p=15
leeand00
I only tested this in Firefox on Linux but given that its using jquery I'd be surprised if it was fairly cross browser. And it doesn't require ids (it would actually be a lot faster to just find the id="foo", I just used ids for my example out of laziness.
Rob Van Dam
Just replace nodes with `var nodes = $('div, script, span');` (and get the tag via `$(node).attr('tagName')`) to see that it works with any arbitrary node that you might have in the dom, so for your case `var nodes = $('[style]');`
Rob Van Dam
@Rob Alright! Nice Rob! Do you mind if I send this link to the jQuery guys? Who knows...maybe they'll be interested.Also, is that your website that you posted the code on? My website's the same way...it's "coming soon"...yup.
leeand00