The Interview Version
A whiteboard implementation of getElementsByClassName is still a common interview code challenge. There are plenty of algorithms, but this is one I’ve written for the purpose. It addresses the fundamentals with its speed, its closure, its mild use of recursion, and, finally, a regular expression. Remove the comments and white space, and you’re left with code that offers decent performance and is still easy to learn. To me, that’s beautiful code.
In order to determine which nodes contain a class, the class attribute of each node must be examined. There is no way around that. We must examine every node, but we need not iterate over every node. We can avoid non-element node types and the script tag (optional). We can also leverage innerHTML and regular expressions to remove large descendant branches of the DOM tree if our classes aren’t in them.
Regular expressions are acceptably fast in JavaScript. It may be true that the regex tests are the bottleneck here, but the total performance is, again, acceptable. My implementation is extremely slow compared to native methods though. The best-known non-native algorithms, including the tree walker often attributed to Douglas Crockford, also outperform my implementation significantly. Even still, we end with a respectable method that’s easy to present and explain.
My Interview Implementation
getElementsByClassName = function(_) { //A var r = new RegExp("\\b"+_+"\\b",""), elements = []; var _class = function(e) { return e.className || ""; } var f = function(context) { //B if (typeof context == "undefined") { return f(document.getElementsByTagName('html')[0]); }; //C var c = _class(context); if ( (c != "") && (r.test(c)) ) { elements.push(context); }; var children = context.childNodes; for (o in children) { var e = children[o]; if (e.nodeType != 1) { continue; } f(e); }; return true; }; f(); return elements; };
John Resig, creator of jQuery, has at least one (aged) post dedicated to a speed comparison between the native version (a host object) of getElementsByClassName and some some of the other well-known implementations, including a variation of the dom walker attributed to Douglas Crockford. You can find those implementations within that post, or on other sites like QuirksMode.
Out of curiosity, I wanted to see where my solution would fit in. So I did some very primitive profiling on Firefox 9.0.1. My test, shown below, was very simple. I initialized the various methods on the Yahoo home page, and looked up the class view_default.
var x = new Date().getTime();
console.log(document.getElementsByClassName('view_default'));
console.log(new Date().getTime() - x);
The results were consistent with Resig’s. The native implementation completed in roughly 22MS. Crockford’s completed in roughly 47MS. My interview implementation crept in at 55MS. 55MS is slow, relatively, but still quite fast compared to many of the implementations that are floating around out there.
Further Analysis
Below is the implementation attributed to Douglas Crockford (in a comment posted to QuirksMode). The author (whomever it may be) has done a very nice job of making this efficient. Each line is purposeful. And the use of primitive structures—avoiding regular expressions, innerHTML, and other complex objects—makes this fast. Naturally, I was curious to see if I could improve it.
Douglas Crockford’s Implementation
function walkTheDOM(node, func) { func(node); node = node.firstChild; while (node) { walkTheDOM(node, func); node = node.nextSibling; } } function getElementsByClassName(className) { var results = []; walkTheDOM(document.body, function (node) { var a, c = node.className, i; if (c) { a = c.split(' '); for (i = 0; i < a.length; i++) { if (a[i] === className) { results.push(node); break; } } } }); return results; }
I tried several things. I added a regular expression test instead of using the loop on the string properties. That didn’t help; It cost me several milliseconds. I put the traverse function inside (altering the location of the reference). That bought me a few milliseconds. I avoid running the function on the node if it’s not an element (type 1). That bought me several milliseconds. I tried to avoid running the function on the node if it was a script. The test cost me one or two milliseconds. So, overall, I was able to shave off about 5 or 6 more milliseconds. Here is my reworking:
Tweaking Mr. Crockford’s Solution
function getElementsByClassName(className) { var results = []; var traverse = function (node, func) { if (node.nodeType == 1) { func(node); } node = node.firstChild; while (node) { traverse(node, func); node = node.nextSibling; } } traverse(document.body, function (node) { var a, c = node.className, i; if (c) { a = c.split(' '); for (i = 0; i < a.length; i++) { if (a[i] === className) { results.push(node); break; } } } }); return results; }