views:

346

answers:

4

Does the DOM have a hash-table of elements whose keys are the elements' ids?
I want to know if document.getElementById looks up a hash table or traverses the entire tree.
Is this behavior different across browsers?

A: 

It shouldn't be hard to test.

If it's tree-based then making a very deep tree (via Javascript) should be a good test case.

David Toso
Please please answer the question and dont post to get rep.
Srinivas Reddy Thatiparthy
Yes, please make that test, on at least 3 major browsers.
Kobi
@Srinivas: have you seen my rep? I could care less.
David Toso
Is there some unwritten rule that I shouldn't contribute to the discussion unless my life depends on gaining XP^H^Hrep points?
David Toso
@David: what ever the reason,i and MOST of SO users post answers when they know answer.nothing personal please :)
Srinivas Reddy Thatiparthy
This answer would be a good comment, but I guess it's posted as an answer because you state that you care more about reputation than you otherwise might.
Anonymous
+2  A: 

The specific implementation isn't defined in the HTML spec so it could easily vary browser to browser. For example IE documentation states

Returns a reference to the first object with the specified value of the ID or NAME attribute.

so I'd be tempted to say it does a search (or it just throws out elements in cases of hash collisions).

EDIT Also keep in mind there are other data structures (like trees) that allow for access time somewhere between constant and linear.

R0MANARMY
Remember that IE's implementation -- the bit you quoted -- is broken, completely at odds with the behavior specified by the W3C. They fixed this in IE8, fortuntely.
T.J. Crowder
Good point. To be fair W3C says **...Behavior is not defined if more than one element has this ID.** Technically IE can't be wrong wrong if the correct behavior isn't defined.
R0MANARMY
@ROMANARMY: The wrong bit is intermixing `name` with `id`. If you have duplicate IDs in your documents, as you say, whatever the browser wants to do (including ignoring all of them) is perfectly fine.
T.J. Crowder
+4  A: 

Implementations are free to do whatever they like, but since id is defined as a unique value, it would seem sensible to use a hash map or similiar lookup mechanism rather than traversal. What seems sensible from the outside, though, may not be when you get into the plumbing of building a complex web browser with many (sometimes conflicting) imperatives.

I did an easy but very simplistic test (see page at end of answer). It's very simplistic not least because we don't know that browsers don't cache previous results.

Chrome 4.1.249.1059 reports:

ID: 0.0082ms per lookup
Tag: 0.0249ms per lookup

So, dramatically faster by ID than tag name.

IE7 reports:

ID: 2.4438ms per lookup
Tag: 0.0437ms per lookup

So dramatically faster by tag name than ID (but remember IE7 has a broken concept of getElementById; this is fixed in IE8).

IE8 (on a different machine, don't compare absolutes, we're looking at diffs within the browser tested) reports:

ID: 1.1335ms per lookup
Tag: 0.0287ms per lookup

So about the same as IE7.

Firefox 3.6.3 reports:

ID: 0.0042ms per lookup
Tag: 0.006ms per lookup

So it doesn't care that much (on repeated requests; again, it may be caching).

Opera 10.5.1 reports:

ID: 0.006ms per lookup
Tag: 1.467ms per lookup

Dramatically faster by ID than tag name.

Make of those results what you will. A more complex test case would be needed to really infer the mechanisms. Of course, in at least two of those cases (Firefox and Chrome), we can go look at the source. CMS kindly points us to the WebKit and Firefox implementations (and looking at it, my suspicion about caching may have been on the money).

Test page:

<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>Test Page</title>
<style type='text/css'>
body {
    font-family: sans-serif;
}
#log p {
    margin:     0;
    padding:    0;
}
</style>
<script type='text/javascript'>
window.onload = pageInit;
function pageInit() {
    document.getElementById('btnGo').onclick = btnGoClick;
}
function btnGoClick() {

    log("Testing...");
    setTimeout(run, 0);
}

function run() {
    var count, time;

    try {
        // Warm up
        testid(10);
        testtag(10);

        // Test
        count = 10000
        time = testid(count);
        log("ID: " + (time / count) + "ms per lookup");
        time = testtag(count);
        log("Tag: " + (time / count) + "ms per lookup");
    }
    catch (e) {
        log("Error: " + (e.message ? e.message : String(e)));
    }
}

function testid(count) {
    var start;

    start = new Date().getTime();
    while (count-- > 0) {
        if (!document.getElementById('fred')) {
            throw "ID 'fred' not found";
        }
    }
    return new Date().getTime() - start;
}

function testtag(count) {
    var start;

    start = new Date().getTime();

    while (count-- > 0) {
        if (document.getElementsByTagName('em').length == 0) {
            throw "Tag 'em' not found";
        }
    }
    return new Date().getTime() - start;
}

function log(msg) {
    var log = document.getElementById('log');
    log.innerHTML += "<p>" + msg + "<\/p>";
}
</script>
</head>
<body><div>
<input type='button' id='btnGo' value='Go'>
<div id='log'></div>
<hr>
<div>test test<span>test<span>test<span>test<span>test</span></span></span></span></div>
<div>test test<span>test<span>test<span>test<span>test</span></span></span></span></div>
<div>test test<span>test<span>test<span>test<span>test</span></span></span></span></div>
<div>test test<span>test<span>test<span>test<span>test</span></span></span></span></div>
<!-- repeat the above a couple of thousand times; I had about 2,200 -->
<div>test test<span>test<span>test<span>test<span>test</span></span></span></span></div>
<div>test test<span>test<span>test<span>test<em id='fred'>test</em></span></span></span></div>
</div></body>
</html>
T.J. Crowder
Here's the live example for anyone who's interested http://jsbin.com/esemi3/
R0MANARMY
@ROMANARMY: That needs a *lot* more content (see the inline comment in the HTML).
T.J. Crowder
@T.J. - Nice test, I tried to add the content on jsbin, but seems that they have a size limit, I've uploaded it [here](http://dl.dropbox.com/u/35146/js/tests/getElementById.html).
CMS
According to the question, you'll do better comparing `getElementById('log')` and `getElementById('fred')` - one is close to the beginning, and the other is after many elements (optimally, it'd been deeper too). If the search is linear you might see a difference. You are assuming `getElementsByTagName` is linear, but is also building a large collection.
Kobi
@Kobi: A very good point. Can't do that right now, but may do it in a few hours.
T.J. Crowder
I was tempted to mark @CMS's answer but i have to give it to Crowder. The answer i infer is - it depends. At least the first lookup doesn't seem to come from a hastable atleast for IE. Interesting how getElementsByTag is faster than id. We'll have to relook at some of our scripts.
sash
+9  A: 

I know about the Firefox and WebKit DOM implementations, both of them use a hashtable to lookup the elements, digging into the source of them you can give a look to the internal implementations:

WebKit implementation, Document.cpp, uses the hashtable if the id is unique, otherwise it traverses the document to get the first match:

Element* Document::getElementById(const AtomicString& elementId) const
{
    if (elementId.isEmpty())
        return 0;

    m_elementsById.checkConsistency();

    Element* element = m_elementsById.get(elementId.impl());//<-- hastable lookup
    if (element)
        return element;

    if (m_duplicateIds.contains(elementId.impl())) {
        // We know there's at least one node with this id,
        // but we don't know what the first one is.
        for (Node *n = traverseNextNode(); n != 0; n = n->traverseNextNode()) {
            if (n->isElementNode()) {
                element = static_cast<Element*>(n);
                if (element->hasID() &&
                element->getAttribute(element->idAttributeName()) == elementId) {
                    m_duplicateIds.remove(elementId.impl());
                    m_elementsById.set(elementId.impl(), element);
                    return element;
                }
            }
        }
        ASSERT_NOT_REACHED();
    }
    return 0;
}

Firefox implementation, nsDocument.cpp

CMS
+1 Interesting. So, using the same `id` twice not only makes your document invalid, it also makes performance worse.
bobince
@CMS the hashtable might just be used for caching. We'll only know if we run tests on a large DOM.
sash