Proposed coding convention for closures

| 2 Comments

By now, many of us have gotten used to using closures in JavaScript to define a scope that holds private variables and utility functions so that we don't have to put these in the global namespace. The idiomatic code looks like this:

(function() {
      var private_var;    // Visible only inside this function
      function helper_function() { ... }
      
      // export an object or function to the global namespace
})();  // Invoke the outer function

The outer function exists only to create a scope to hold our internal variables. It has no name and is invoked exactly once, immediately after being defined. The fact that this is a function expression (rather than a statement-like function declaration) means that we can invoke it immediately after defining it. The parentheses before the function keyword and after the closing } are required because otherwise the JavaScript interpreter would think that this was a function declaration and would complain about the missing function name.

That unusual opening parenthesis before "function" serves another important purpose in this idiom. It alerts us to the fact that this function is being used idiomatically, that it exists solely to create a scope and that it is going to be invoked immediately.

So far, so good. But I've finally gotten around to reading Douglas Crockford's JavaScript: The Good Parts and it has made me think that an additional explicit naming convention would be helpful. (As an aside, Crockford's short book is worth a read, though I find that I disagree with some of his coding conventions. I'm tempted to write a review titled "JavaScript: The Good Parts: The Good Parts"...)

When the function keyword is the first token in a new JavaScript statement, the interpreter expects to see a function declaration, not a function expression. That's why we needed the idiomatic parentheses in the code above. But when the function keyword is used as part of an assignment or as an argument to some other function, those parentheses are not required. Crockford's book includes code like this (page 37):

var myObject = function() {
    var value = 0;

    return { // 7 lines of code omitted here
    };
}();

When I first read the assignment statement it appears to me (despite the name of the variable) that it is a function value being assigned. In fact, however, the function is merely there to establish a scope for private variables. The function is invoked as soon as it is defined, and it is the return value of the function that is assigned. The problem is that I don't realize this until I read all the way down to the end of the function. Appendix E takes this to an extreme. The code begins:

var json_parse = function() {

It looks like we're creating a function and assigning it to the variable json_parse. But five pages later we see:

}();

Now we realize that the value assigned to json_parse is not the function we thought it was, but the function returned by that function.

Another example appears on page 40:

String.method('deentityify', function() {
   // 25 lines omitted
}());

It appears at first that we're passing a function as the second argument of the invocation of String.method. It is not 'till we read all the way through this function that we realize that we're passing the result of invoking the function.

So, how can this code be improved? One way would be to use the idiomatic parentheses around function expressions that are going to be immediately invoked even when they are not necessary. That would turn the code above into this:

String.method('deentityify', (function() {
   // 25 lines omitted
})());

I think that would be helpful, but I think we can do better. Function expressions are allowed to have names, and those names are only visible within the body of the function (allowing such a function to invoke itself recursively, for example). So let's say that when we're going to define a function for the purpose of creating a scope we make that explicit by giving it a dummy name like "scope" or "closure" or "invocation". This results in code like:

var myObject = function scope() {
   // code omitted
}();

String.method('deentityify', function invocation() {
   // 25 lines omitted
}());

Your thoughts are welcome in the comments. Has anyone else proposed a convention like this? What's the best name to use for these functions?

Update: I suppose we could also simply adopt a comment-based convention:

var myObject = /* return value of */ function () {
   // code omitted
}();

String.method('deentityify', /* result of */ function() {
   // 25 lines omitted
}());

Update 2: It turns out that Douglas Crockford is about 2 months ahead of me. In March he updated jslint to (optionally) issue warnings about immediate invocation of function expressions unless the entire invocation appears in parentheses, and also to warn if a function is parenthesized and is not immediately invoked. So Crockford's convention looks like the following:

var myObject = (function () {
   // code omitted
}());

String.method('deentityify', (function() {
   // 25 lines omitted
}()));

Note that Crockford wraps the invocation in parens, not just the function. That is, he uses ()) at the end instead of (the more commonly used) )(). He has said that he'll update his book to follow these conventions in the next printing.

New version of Jude, plus Java 1.5 server JVM bug

I've just released Jude version 1.07. This is a relatively minor bug-fix release. Thanks to B.L. for reporting the bugs and helping to isolate them.

Interestingly, one of the bugs reported against the previous version was an ArrayIndexOutOfBoundsException at a spot where such an exception really was not possible. This had me really puzzled--I could not duplicate it. But when I discovered that inserting debugging println() calls made it go away, I realized that this was a JVM problem and not my bug. It turns out that in Java 5 (we tested u17 and u18) on Linux (at least) running with the -server option would cause this spurious exception. Running with -client (which is the default for most installations, I think) would not cause it. The crash never occurred at precisely the same spot in a run, leading me to think it was a GC bug. Unfortunately, I've got no idea how to isolate a bug like this with a simple test case so that I can report it.

New ECMAScript version numbering scheme

Per a post today on the es-discuss mailing list, the next version of the JavaScript standard will be ECMAScript 5. This version was previously called ECMAScript 3.1, and is a relatively small and long-overdue update to the language. Version 4 of the standard has been in the planning stages for 10 years or more, but those plans have been scrapped. To avoid confusion, with those old plans, however, there will be no version 4 of the standard.

$SAFE is Proc-local

| 1 Comment

While researching Ruby's new-in-1.9 Object methods untrusted?, untrust, and trust, I discovered something I did not know about the $SAFE variable: in addition to being Thread-local, it is also Proc-local. Proc objects (both procs and lambdas) have their own copy of $SAFE, and they run at whatever $SAFE level was in effect when the Proc was created not the safe level that is in effect when they are invoked. This was discussed three years ago over at _why's old blog.

There is a corollary, however that was demonstrated but not discussed in a recent ruby-core post by Shugo Maeda: if you set $SAFE inside a proc or a lambda, that setting is local, and does not change the global safe level. Googling for "safe_eval" implementations shows that everyone uses Thread.new to create a sandbox with a locally elevated safe level. It turns out, however, that an ordinary lambda will also work. (One can argue, however, that a thread gives extra safety because it can be monitored and killed to guard against runaway code that doesn't terminate.)

In any case, here is the method I defined to try this out. Pass it a safe level (it defaults to 4) and a block of code, and it will execute the block at that level without altering the global safe level and without creating a new thread:

# Execute a block at the specified safe level. Tested with Ruby 1.9 and 1.8.7
def safely(level = 4)
  sandbox = lambda do # Set up a sandbox 
    $SAFE = level     # Go to the specified safe level for this lambda only
    yield             # Invoke the block at that level
  end
  sandbox.call        # Invoke the sandbox without changing $SAFE globally
end

# Use this method like this. 
x = safely { eval('"untrusted code"') }
[x, x.tainted?, x.untrusted?]  # => ["untrusted code", true, true]

I'm not convinced that this method is actually useful in practice, but I thought it was interesting, and it reminds me again how cool it is to be able to define methods that accept blocks and act like control statements.

Bending the Arc of History

President-elect Obama sure writes and delivers a great speech! My favorite line from his victory speech last night:

put their hands on the arc of history and bend it once more toward the hope of a better day.

I love the image of bending the arc of history. Certainly appropriate to the occasion last night. Let's hope that under the Obama administration we can continue to bend the arc of history toward (to pick one example) reduced CO2 emissions!

site upgrade

| 3 Comments

I've migrated my site to a new webhost, and have upgraded to a new version of my blogging software. If you've visited over the last couple of days, you seen the site in a chaotic state, but I think things have settled down now.

Method Chaining Part 2

The comments on my last post about method chaining in JavaScript were spectacular, and I want to publicly thank all who took the time to read my code and think about it. The final version of the code (which you can see below the fold) is much stronger thanks to their comments.

In the 5th edition of my JavaScript book I made the embarrassing mistake of recommending a constructor and method chaining technique that only works for shallow class hierarchies--it works when class B extends A, for example, but not when C extends B and B extends A.

The technique I recommended was to put a superclass property in the prototype object of a class, and then to chain to a superclass constructor by calling this.superclass(). To see why this fails, imagine that we're creating an instance of class C (which extends B which extends A). The constructor C() chains to B() by calling this.superclass(). The constructor method B() is invoked on the same instance of C, however, so when it attempts to chain to A() by calling this.superclass(), it just ends up invoking itself. This incorrect chaining technique is discussed in sections 9.5.1 and 9.5.2, and is also used in example 9-10 and 9-11 at the very end of the chapter. I blogged about this mistake and a possible workaround almost two years ago.

Now, however, O'Reilly is preparing to do a reprint of the book, and I have an opportunity to fix this mistake. Below is a revised code from examples 9-10 and 9-11. I've renamed the defineClass() method to Class() and have modified it so that it automatically does constructor chaining (I was inspired by dojo for this change). More importantly, I've simplified method chaining by defining a global method named chain() for method chaining. If a method overrides a method defined by a superclass (or a "mixin" class) it can invoke that overridden class by invoking chain() like this: chain(this,arguments). (The second argument must be the arguments array of the overriding method, and the first argument must be the object on which that method was invoked.)

The code is below the fold. I think this is interesting JavaScript, and I'd love to have it checked for errors before it goes into print again... Please leave a comment if you think it could be improved! Update: comments are now closed; spammers have struck.

Books

Comprehensive coverage of Ruby 1.8 and 1.9

"The New Most Important Ruby Book"
Peter Cooper,
rubyinside.com

Completely updated for Ajax and Web 2.0

"A must-have reference"
Brendan Eich,
creator of JavaScript

The classic Java quick-reference