Closures in Java 7 After All

| No Comments

I've been focusing on JavaScript recently, so I missed this when it first came out:
Closures for Java. I think it is interesting that the motivation for finally doing this is to facilitate APIs for concurrency.
Sun will not be using any of the existing closures proposals as a starting point, but their initial ideas are perhaps closest to the FCM (first-class methods) proposal.

There are further details here including the ominous admission by Sun that they don't feel they can get any JSRs (for closures, Project Coin, or Java 7) approved by the JCP until they resolve their dispute with Apache. In the meantime, development of closures and the Coin extensions is happening outside of the JCP in the OpenJDK.

In related news, the schedule for OpenJDK7 has slipped and a final release is now due in September 2010. Note that this is the schedule for the JDK7, not for Java 7.

A module loader with simple dependency management

| 4 Comments

I've written another version require2.js of my CommonJS module loader require() function. This one has two interesting features.

First, you can "pre-load" modules by mapping the module filename to the module function in the require._module_function object. If you do this, then the module will not need to be loaded. For example:

require._module_functions['math.js'] = function(require,exports,module) {
     // Code for the math module goes here
};

Second, this new version of require() has a require._print hook, which, if set to a suitable function, will print out the text of all modules it loads, wrapped in a function and assigned to the require._module_function map as above. You can even define a require._minimize hook if you want to do code minimization on your modules.

I've defined another script display_requirements.js that defines suitable _print and _minimize functions.

So, here's the upshot. For relatively simple applications that load modules statically at start up, use the require2.js script for loading modules. Its inefficient, but works well during the development phase. Then, when you're getting ready to deploy your application, load the display_requirements.js script after loading require2.js but before you actually call require() anywhere. This will cause a big chunk of code to appear at the bottom of your web page--pre-loaded minimized versions of all the modules you used. Cut-and-paste this code into a new file named requirements.js (or even paste it at the bottom of require2.js) and load the requirements.js script in place of the display_requirements.js script. Now you can continue to use require() as you have always done, but it won't have to hit the network to load your modules.

I haven't done much deployment of real-world web applications, and am not qualified to say whether a system like this would actually be helpful in practice. But it was an easy tweak to my existing code, so there it is.

CommonJS Modules implementation

| 9 Comments

I've implemented the CommonJS Modules 1.0 specification with the code in this file. It appears to pass the compliance tests when run in Firefox and Chrome on Linux, and also when run standalone in Tracemonkey, Rhino or V8.

Note that this implementation does not use the namespace probing technique I described in my previous post.

Update: One of the things that really surprised me when testing my implementation was to discover that the CommonJS spec requires (this is not explicit in the specification text, but it is explicit in the conformance tests) the require() function to return the actual exports object of a module, and not make a defensive copy of it.

Suppose a program includes module A which includes modules B and C. Module C can require B and then add, replace, or remove methods from B's API. Later, when the program includes Module B directly, it will get the modified version of B. In order to correctly use this modified module B, the programmer will have to read the documentation for module C!

Update 2: There are great comments to this post, including a link to Luke Smith's blog post that argues that the synchronous nature of CommonJS modules is not a good fit for client-side scripting. In response I wanted to make clear that I posted this code because I thought it was interesting, not because I think that client-side programmers should go out and start using it right away.

Functions as Namespaces, and How to Peek Inside

| 18 Comments

It has become common in modern JavaScript programming to use functions as namespaces. If you put your code inside a function, then your variables and functions are local to the containing function and do not clutter up the global scope.

var value = (function() {  // Wrapper function creates a local scope or namespace
    // your code goes here
    return value;  // Export a value from the namespace
)());  // Invoke the wrapper function to run your code      

Now suppose that you have some JavaScript code as a string--you've just loaded it using XMLHttpRequest, for example. You're going to evaluate the code, and you might want to evaluate it in a namespace so that it doesn't define functions and variables in the global scope. This is easy: just wrap it in a function before evaluating it. In this case, the Function() constructor is even more handy than the eval() function:

var code = ....;  // A string of JS code to evaluate
var f = new Function(code);   // Wrap it in a function
f();    // And run the function

The problem with doing this is that the function creates a sealed namespace and we can't see what is inside. If the code defines something useful like a function or a class, we can't access it, and it does us no good.

Here's a trick I've just discovered. (I'm sure someone else has thought of this, but I haven't seen it used or described elsewhere). Before you wrap your code in a function add this line to it:

return function(s) { return eval(s); };

Now, when you invoke the wrapper function, it returns this evaluator function to you. The returned function evaluates a string in the scope of the namespace, so you can use it to peek into the namespace and extract whatever values you want!

If your string of code defines a constructor function named Set() that you want to use, you can run the code in a namespace and then extract f from the namespace like this:

var code = readFile("Set.js");  // A string of JS code to evaluate
// Define and invoke a wrapper function with special suffix code.
// The return value is a namespace evaluator function and we treat
// it as a namespace object.
var setns = new Function(code + "return function(s) { return eval(s); };")(); 
var Set = setns("Set");  // Import the Set function from the namespace.
var s = new Set();  // Use the class we just imported

And what if there are 3 values you want to extract from the namespace?

// Extract an object containing 3 values from the namespace
var sets = setns('{Set:"Set", BitSet:"BitSet", MultiSet:"MultiSet"}');
var bs = new sets.BitSet();

I've defined a namespace() function for loading code and doing this kind of namespacing automatically:

Google Closure Library and Optimizer

| No Comments

Google has open-sourced the javascript library and optimizer they use in gmail and other web applications. They call it "Closure" and you can read about it here.

While the optimizer looks very cool, I think the library is most impressive. Its a really large code base, and at least some of it has been thoroughly field-tested in gmail and similar applications. You can get the code like this:

svn checkout http://closure-library.googlecode.com/svn/trunk/ closure-library-read-only

The closure library is intended to be used with the closure optimizer which removes unused code, so the APIs are broad with lots of methods--there is no sense that the closure developers were skimping on API in order to pack everything into a small download bundle. Also, and perhaps for the same reason, the code is not full of micro-optimizations. Compared to the jQuery and YUI code (for example) the Closure code is straightforward and easy to understand.

My Ruby Book on your iPhone. Cheap!

| 3 Comments

O'Reilly has just released The Ruby Programming Language as a standalone iphone app! Looks like you can get your hands on it for just $5 (The cover price of the print edition is $40.) I don't have an iphone, but if you do, I'd love to hear how the book looks and works for you in that format.

At the same time, O'Reilly has dropped the ebook price for other readers from $30 to $10. Amazon's Kindle price is still $18 today. I don't know if that is going to go down or not. (Update 8/27: the kindle price is now $8.)

I'm not sure how I feel about O'Reilly using my book for their marketing experiments, but it's certainly a good deal for readers. So I ask for your help in making up for the reduced price with a massive increase in sales volume! :-)

And if I can engage in some more self-promotion: reviewers on Amazon think that this is the best book I've ever written. The 26 reviewers have all given it a 5-star rating, and the most recent review goes over the top and calls it "the best ever written programming language book". Wow. You know you want it on your iPhone!

Update: my editor says that he thinks this new ebook price is a temporary sale.

Function objects in JavaScript are callable: you can invoke them. Other, non-function objects are allowed to be callable, however. Host objects in IE (things like Window.alert()) are callable, but are not native function objects. (Other browsers implement DOM methods as true native function objects). Also, a number of browsers have followed Firefox's lead in making RegExp objects callable even though they are not functions. Although you can invoke any callable object like a function, the difference is that callable objects don't have function methods call() and apply() (and bind() in ECMAScript 5).

Today, the typeof operator returns "function" for true function objects, and returns "object" for IE's callable host objects. Most browsers return "object" for callable RegExps, but Safari returns "function", and Google Chrome is likely to follow Safari's lead on this. If you want to be sure that something is a true function (and not a regexp) you can use something like this:

function isFunction(x) { 
    return Object.prototype.toString.call(x) === "[object Function]";
}

There is not today a reliable way to write an isCallable() function, however.

Things change in the ECMAScript 5 specification. The typeof operator is required to return "function" for any native or host object that is callable. When IE implements the spec, we can expect "typeof window.alert" to evaluate to "function". The problem is that browsers like Firefox and Opera that have callable regexps are unlikely to implement the spec: the typeof operator on a regular expression will continue to return "object" for those browsers. The committee writing the specification was aware of this problem, but ran out of time to fix it.

So today typeof x === "function" is close to an isFunction() test, but it fails for regular expressions in some browsers. In ECMAScript 5, typeof will be close to an isCallable() test, but it will fail for regular expressions in some other browsers.

Fortunately, the isFunction() test above should continue to work in ECMAScript 5, and
there is a way to write a reliable isCallable() function in ECMAScript 5. It relies on the fact that the Array.prototype.forEach() method checks its argument for callability even when invoked on an empty array. (So this isCallable() function assumes that browser vendors implement the Array.forEach() method as specified.) Here it is:

Object.isCallable = function(o) {
    // Array.prototype.forEach throws TypeError for non-callable arguments
    try {
        [].forEach(o);  // o will never be invoked, but it will be tested for callabilty
        return true;
    } catch (x) {
        if (x instanceof TypeError) return false;
        else throw x;
    }
};

I was recently writing some documentation for the Array.forEach() method (part of ES5, but most browsers other than IE support it now) and worrying about the fact that there is no clean way to terminate the iteration prematurely. Nothing like the break statement, that is. If you really want to get out of the loop, the function you pass to forEach() has to throw something. And the forEach() method won't catch it for you, so you've got to write your own try block.

Then, when working with the new array predicate method Array.some(), I realized that we don't have to think of it as a predicate method. If we ignore the return value, it is an iterator method that works just like Array.forEach() except that if your function returns true (or any truthy value) then the loop terminates. So inside of the function you pass, a plain return statement with no value is like using a continue statement. And "return true" is like a break statement, causing the loop to terminate. The implicit return that occurs at the end of the function body returns undefined, which is like returning false--it keeps the loop going.

The Array.every() method is not so useful this way: you have to explicitly return a truthy value to keep the loop going, so an implicit return at the end of the function body would act like a break statement.

The problem I see with using some() in this way is a stylistic one: the name really doesn't look like the name of an iterator the way that "each" and "every" do.

Good algorithms are better than clever code

| 5 Comments

Yesterday, I posted an entry about a clever way to implement string multiplication in JavaScript using Array.prototype.join()

In comments, redraiment challenged me, suggesting that an implementation based on string doubling would be more efficient. Sure, I thought, for really large values of n, but surely for small n, using the native join() method would be better, wouldn't it?

It turns out that writing a good algorithm is better than being overly clever (at least in these days of really good JIT interpreters). Here's my new string multiplication code:

String.prototype.times = function(n) {
    var s = this, total = "";
    while(n > 0) {
	if (n % 2 == 1) total += s;
	if (n == 1) break;
	s += s;
	n = n>>1;
    }
    return total;
};

By my simple benchmarks, this implementation is significantly faster than using join(), even when only multiplying by 1 or 2. I've tested it in Firefox 3.5, IE 8 and Safari 3.

String Multiplication in JavaScript

| 13 Comments

In Ruby, the "*" operator used with a string on the left and a number on the right does string repetition. "Ruby"*2 evaluates to "RubyRuby", for example. This is only occasionally useful (when creating lines of hyphens for ASCII tables, for example) but it seems kind of neat. And it sure beats having to write a loop and concatenate n copies of a string one at a time--that just seems really inefficient.

I just realized that there is a clever way to implement string multiplication in JavaScript:

String.prototype.times = function(n) {
    return Array.prototype.join.call({length:n+1}, this);
};

"js".times(5) // => "jsjsjsjsjs"

This method takes advantage of the behavior of the Array.join() method for arrays that have undefined elements. But it doesn't even bother creating an array with n+1 undefined elements. It fakes it out using and object with a length property and relies on the fact that Array.prototype.join() is defined generically. Because this object isn't an array, we can't invoke join() directly, but have to go through the prototype and use call(). Here's a simpler version that might be just as efficient:

String.prototype.times = function(n) { return (new Array(n+1)).join(this);};

When you call the Array() constructor with a single numeric argument, it just sets the length of the returned array, and doesn't actually create any elements for the array.

I've only tested these in Firefox. I'm assuming that either is more efficient than anything that involves an actual loop, but I haven't run any benchmarks.

Proposed coding convention for closures

| 6 Comments

By now, many of us have gotten used to using closures in JavaScript to define a scope that holds private variables and utility functions so that we don't have to put these in the global namespace. The idiomatic code looks like this:

(function() {
      var private_var;    // Visible only inside this function
      function helper_function() { ... }
      
      // export an object or function to the global namespace
})();  // Invoke the outer function

The outer function exists only to create a scope to hold our internal variables. It has no name and is invoked exactly once, immediately after being defined. The fact that this is a function expression (rather than a statement-like function declaration) means that we can invoke it immediately after defining it. The parentheses before the function keyword and after the closing } are required because otherwise the JavaScript interpreter would think that this was a function declaration and would complain about the missing function name.

That unusual opening parenthesis before "function" serves another important purpose in this idiom. It alerts us to the fact that this function is being used idiomatically, that it exists solely to create a scope and that it is going to be invoked immediately.

So far, so good. But I've finally gotten around to reading Douglas Crockford's JavaScript: The Good Parts and it has made me think that an additional explicit naming convention would be helpful. (As an aside, Crockford's short book is worth a read, though I find that I disagree with some of his coding conventions. I'm tempted to write a review titled "JavaScript: The Good Parts: The Good Parts"...)

When the function keyword is the first token in a new JavaScript statement, the interpreter expects to see a function declaration, not a function expression. That's why we needed the idiomatic parentheses in the code above. But when the function keyword is used as part of an assignment or as an argument to some other function, those parentheses are not required. Crockford's book includes code like this (page 37):

var myObject = function() {
    var value = 0;

    return { // 7 lines of code omitted here
    };
}();

When I first read the assignment statement it appears to me (despite the name of the variable) that it is a function value being assigned. In fact, however, the function is merely there to establish a scope for private variables. The function is invoked as soon as it is defined, and it is the return value of the function that is assigned. The problem is that I don't realize this until I read all the way down to the end of the function. Appendix E takes this to an extreme. The code begins:

var json_parse = function() {

It looks like we're creating a function and assigning it to the variable json_parse. But five pages later we see:

}();

Now we realize that the value assigned to json_parse is not the function we thought it was, but the function returned by that function.

Another example appears on page 40:

String.method('deentityify', function() {
   // 25 lines omitted
}());

It appears at first that we're passing a function as the second argument of the invocation of String.method. It is not 'till we read all the way through this function that we realize that we're passing the result of invoking the function.

So, how can this code be improved? One way would be to use the idiomatic parentheses around function expressions that are going to be immediately invoked even when they are not necessary. That would turn the code above into this:

String.method('deentityify', (function() {
   // 25 lines omitted
})());

I think that would be helpful, but I think we can do better. Function expressions are allowed to have names, and those names are only visible within the body of the function (allowing such a function to invoke itself recursively, for example). So let's say that when we're going to define a function for the purpose of creating a scope we make that explicit by giving it a dummy name like "scope" or "closure" or "invocation". This results in code like:

var myObject = function scope() {
   // code omitted
}();

String.method('deentityify', function invocation() {
   // 25 lines omitted
}());

Your thoughts are welcome in the comments. Has anyone else proposed a convention like this? What's the best name to use for these functions?

Update: I suppose we could also simply adopt a comment-based convention:

var myObject = /* return value of */ function () {
   // code omitted
}();

String.method('deentityify', /* result of */ function() {
   // 25 lines omitted
}());

Update 2: It turns out that Douglas Crockford is about 2 months ahead of me. In March he updated jslint to (optionally) issue warnings about immediate invocation of function expressions unless the entire invocation appears in parentheses, and also to warn if a function is parenthesized and is not immediately invoked. So Crockford's convention looks like the following:

var myObject = (function () {
   // code omitted
}());

String.method('deentityify', (function() {
   // 25 lines omitted
}()));

Note that Crockford wraps the invocation in parens, not just the function. That is, he uses ()) at the end instead of (the more commonly used) )(). He has said that he'll update his book to follow these conventions in the next printing.

New version of Jude, plus Java 1.5 server JVM bug

I've just released Jude version 1.07. This is a relatively minor bug-fix release. Thanks to B.L. for reporting the bugs and helping to isolate them.

Interestingly, one of the bugs reported against the previous version was an ArrayIndexOutOfBoundsException at a spot where such an exception really was not possible. This had me really puzzled--I could not duplicate it. But when I discovered that inserting debugging println() calls made it go away, I realized that this was a JVM problem and not my bug. It turns out that in Java 5 (we tested u17 and u18) on Linux (at least) running with the -server option would cause this spurious exception. Running with -client (which is the default for most installations, I think) would not cause it. The crash never occurred at precisely the same spot in a run, leading me to think it was a GC bug. Unfortunately, I've got no idea how to isolate a bug like this with a simple test case so that I can report it.

New ECMAScript version numbering scheme

Per a post today on the es-discuss mailing list, the next version of the JavaScript standard will be ECMAScript 5. This version was previously called ECMAScript 3.1, and is a relatively small and long-overdue update to the language. Version 4 of the standard has been in the planning stages for 10 years or more, but those plans have been scrapped. To avoid confusion, with those old plans, however, there will be no version 4 of the standard.

Books

Comprehensive coverage of Ruby 1.8 and 1.9

"The New Most Important Ruby Book"
Peter Cooper,
rubyinside.com

Completely updated for Ajax and Web 2.0

"A must-have reference"
Brendan Eich,
creator of JavaScript

The classic Java quick-reference

Advertising

Pages

Hosted By

Powered by Movable Type 4.21-en