Hand-coded JavaScript: getElementsByClassName

The Interview Version

A whiteboard implementation of getElementsByClassName is still a common interview code challenge. There are plenty of algorithms, but this is one I’ve written for the purpose. It addresses the fundamentals with its speed, its closure, its mild use of recursion, and, finally, a regular expression. Remove the comments and white space, and you’re left with code that offers decent performance and is still easy to learn. To me, that’s beautiful code.

In order to determine which nodes contain a class, the class attribute of each node must be examined. There is no way around that. We must examine every node, but we need not iterate over every node. We can avoid non-element node types and the script tag (optional). We can also leverage innerHTML and regular expressions to remove large descendant branches of the DOM tree if our classes aren’t in them.

Regular expressions are acceptably fast in JavaScript. It may be true that the regex tests are the bottleneck here, but the total performance is, again, acceptable. My implementation is extremely slow compared to native methods though. The best-known non-native algorithms, including the tree walker often attributed to Douglas Crockford, also outperform my implementation significantly. Even still, we end with a respectable method that’s easy to present and explain.

My Interview Implementation

getElementsByClassName = function(_) {
    
    //A
    var r = new RegExp("\\b"+_+"\\b",""), elements = [];
    
    var _class = function(e) {
        return e.className || "";
    }
    
    var f = function(context) {
        
        //B
        if (typeof context == "undefined") { 
            return f(document.getElementsByTagName('html')[0]);
        };

        //C
        var c = _class(context);
        if ( (c != "") && (r.test(c)) ) { elements.push(context); };
        
        var children = context.childNodes;   
        for (o in children) {
            var e = children[o];
            if (e.nodeType != 1) { continue; }
            f(e);
        };
    
        return true;    
    };
    
    f();
    return elements;

};

John Resig, creator of jQuery, has at least one (aged) post dedicated to a speed comparison between the native version (a host object) of getElementsByClassName and some some of the other well-known implementations, including a variation of the dom walker attributed to Douglas Crockford. You can find those implementations within that post, or on other sites like QuirksMode.

Out of curiosity, I wanted to see where my solution would fit in. So I did some very primitive profiling on Firefox 9.0.1. My test, shown below, was very simple. I initialized the various methods on the Yahoo home page, and looked up the class view_default.

var x = new Date().getTime();
console.log(document.getElementsByClassName('view_default'));
console.log(new Date().getTime() - x);

The results were consistent with Resig’s. The native implementation completed in roughly 22MS. Crockford’s completed in roughly 47MS. My interview implementation crept in at 55MS. 55MS is slow, relatively, but still quite fast compared to many of the implementations that are floating around out there.

Further Analysis

Below is the implementation attributed to Douglas Crockford (in a comment posted to QuirksMode). The author (whomever it may be) has done a very nice job of making this efficient. Each line is purposeful. And the use of primitive structures—avoiding regular expressions, innerHTML, and other complex objects—makes this fast. Naturally, I was curious to see if I could improve it.

Douglas Crockford’s Implementation

function walkTheDOM(node, func) {
    func(node);
    node = node.firstChild;
    while (node) {
        walkTheDOM(node, func);
        node = node.nextSibling;
    }
}

function getElementsByClassName(className) {
    var results = [];
    walkTheDOM(document.body, function (node) {
        var a, c = node.className,
            i;
        if (c) {
            a = c.split(' ');
            for (i = 0; i < a.length; i++) {
                if (a[i] === className) {
                    results.push(node);
                    break;
                }
            }
        }
    });
    return results;
}

I tried several things. I added a regular expression test instead of using the loop on the string properties. That didn’t help; It cost me several milliseconds. I put the traverse function inside (altering the location of the reference). That bought me a few milliseconds. I avoid running the function on the node if it’s not an element (type 1). That bought me several milliseconds. I tried to avoid running the function on the node if it was a script. The test cost me one or two milliseconds. So, overall, I was able to shave off about 5 or 6 more milliseconds. Here is my reworking:

Tweaking Mr. Crockford’s Solution

function getElementsByClassName(className) {
    var results = [];
    
    var traverse = function (node, func) {
            if (node.nodeType == 1) {
                func(node);
            }
            node = node.firstChild;
            while (node) {
                traverse(node, func);
                node = node.nextSibling;
            }
        }

    traverse(document.body, function (node) {
        var a, c = node.className, i;
        if (c) {
            a = c.split(' ');
            for (i = 0; i < a.length; i++) {
                if (a[i] === className) {
                    results.push(node);
                    break;
                }
            }
        }
    });
    return results;
}

GridHelper: A random walk

The following chunk of code will print out the total successful variations for walking a grid one square at a time, when it has been declared that we begin in the upper-left square, that we are trying to reach the lower-right square, that we cannot move diagonally, and that we cannot revisit a square. The GridHelper object accepts an x bound (positive int), a y bound (positive int), a “force” attribute (positive int), and an option to print the paths as they’re found (true|false).

#!/usr/bin/php
<?php

    class GridHelper {
        private $b1 = 0;
        private $b2 = 0;
        private $b_as_string;
        private $force = 0;
        
        private $debug;
    
        private $d1 = 1;
        private $d2 = 1;
        
        private $saved_paths = array(); //successful paths
        private $visited = array("1,1"=>true); //visited points (in a path)
        private $path = ""; //StringBuffer of x,y paths 
        private $count = 0;
        /*$path = x,y x1,y1, x2,y2, 4,4 */
        
        private $fail = 0; //total failures since last success
        
        function __construct($x,$y,$force,$debug) {
            $this->b1 = $x;
            $this->b2 = $y;
            $this->force = $force;
            $this->b_as_string = join(",",array($x,$y));
            $this->debug = $debug;
        }
        
        public function __tostring() {
            return "GridHelper";
        }
    
        public function seek() {
            while ($this->fail < $this->force) { //200,000
                $this->reset();
                $this->path();
                $this->count += 1;
            }
            //foreach($this->saved_paths as $path) { print $path."\n"; }
            return count($this->saved_paths);
        }
        
        public function path() {
            $point = $this->getNextPoint();
            $point_as_string = join(",",$point);
            if ($point_as_string == "0,0") { 
                $this->fail += 1; 
                return; 
            }
            if ($point_as_string == $this->b_as_string) { 
                if (!isset($this->saved_paths[$this->path])) {
                    $this->saved_paths[$this->path] = $this->path;
                    if($this->debug){print $this->count.": ".$this->path."\n";}
                    
                    $this->fail = 0;
                }
                return; 
            }
            return $this->path();
        }
        
        public function reset() {
            $this->d1 = 1;
            $this->d2 = 1;
            $this->path = "";
            $this->visited = array("1,1"=>true);
        }
        
        
        private function getNextPoint() {
            $x = array();
            
            //Add in our choices
            if (($this->d1 != 1)&&
                (!isset($this->visited[join(",",array($this->d1 -1,$this->d2))]))) { 
                    array_push($x,array($this->d1 -1,$this->d2)); 
                }
            if (($this->d1 != $this->b1)
                &&(!isset($this->visited[join(",",array($this->d1 +1,$this->d2))]))) { 
                    array_push($x,array($this->d1 +1,$this->d2)); 
                }
            if (($this->d2 != 1)
                &&(!isset($this->visited[join(",",array($this->d1,$this->d2 -1))]))) { 
                    array_push($x,array($this->d1,$this->d2 -1)); 
                }
            if (($this->d2 != $this->b2)
                &&(!isset($this->visited[join(",",array($this->d1,$this->d2 +1))]))) { 
                    array_push($x,array($this->d1,$this->d2 +1)); 
                }

            $l = count($x);
            $new_point_array = array(0,0); //Default is no choice
            if ($l > 0) { //If the length of X is > 0, we have choices.             
                shuffle($x);
                $new_point_array = $x[array_rand($x,1)];
                $this->d1 = $new_point_array[0];
                $this->d2 = $new_point_array[1];
                $new_point_array_as_string = join(",",$new_point_array);
                $this->visited[$new_point_array_as_string] = true;
                $this->path .= $new_point_array_as_string." ";
            } 
            return $new_point_array;
        }
    
    }
    
    //GridHelper(x-count,y-count,#of failures to test through,show paths)
    $g = new GridHelper(4,3,200,true);
    print $g->seek()."\n";

    exit;
?>