Atlas · Details
js2-mode: a new JavaScript mode for Emacs
AI Notes
One half of a larger plan to make it possible to write Emacs extensions
in JavaScript instead of Emacs-Lisp — js2-mode was the editor side, and
Ejacs (the runtime) would land later the same year. The technical move
that made js2-mode survive was the parser: Steve had ported Mozilla's
Rhino recursive-descent JavaScript
parser to Emacs Lisp — roughly ten thousand lines of elisp — so syntax
highlighting and structure recognition came from a real JavaScript
parse tree, not font-lock regex heuristics. "It doesn't use heuristics
or guesswork; it's exactly the same parser used by JavaScript engines."
The post is candid about the parts that didn't work: a month of
incremental-parsing work abandoned in favour of asynchronous full-restart
parsing; 1,500 lines of doomed-indent.el thrown out after
the cc-engine approach proved intractable, replaced with a "bounce
indenter" cycling among likely indentation points (built on Karl
Landström's lightweight 200-line approach). Code folding, comment/string
filling, syntax-error highlighting, strict-mode warnings, jsdoc
highlighting.
One of the most-cited pieces of Steve's engineering output. Steve is
the original author of js2-mode.el; mooz/js2-mode
is the canonical maintained fork today, and
johan/js2-mode-yegge mirrors the original. The larger
"write Emacs in JavaScript" plan never fully landed, but the editor side
outlived the project that birthed it.
Related listings
-
2008
Ejacs: a JavaScript interpreter for Emacs
The companion project — same year, same parser, same goal. js2-mode is the editor; Ejacs is the interpreter. Both are pieces of the same plan to let Emacs people write extensions in JavaScript instead of Elisp.
-
2008
XEmacs is Dead. Long Live XEmacs!
Same month, same Steve, same Emacs subculture — the eulogy for XEmacs. The two posts together are a snapshot of where the Emacs world stood in spring 2008.
-
2008
Emergency Elisp
Steve's crash course in writing Elisp under duress, from earlier the same year — the prose context for the kind of programmer who would happily port a Rhino parser to ten thousand lines of elisp.
From the peanut gallery
Read the rest of the thread · 42 more
-
My javascript.el/configuration/font combination has the author as "Karl Landström" (that's an o with an umlaut).
This looks very cool. Next time I have some Javascript hacking to do, I'll definitely grab it. -
I would be interested in seeing the source code as an example of how a competent coder would do unit testing in emacs. Hence this nudge to try and persuade you to upload it.
-
If only it weren't so late...
I will be playing with this tomorrow, looking forward to it. -
Highlighting still breaks when I try to open JQuery 1.2.3 (uncompressed).
Nevertheless, pretty nifty. -
This sounds great and I'm definitely going to be your guinea pig!
For some reason my formatting conventions are really whacky ATM so hopefully I can be a real pain. -
James Gosling (one of the authors of emacs) has the following to say: "Emacs was a really good idea in 1978. That was like 30 years ago. Stop using it!"
http://www.builderau.com.au/tags/0,339028020,4000198303o,00.htm
Power to ya Stevey. -
Very nice piece of software, indeed.
I see in the comments (in the code, not here) that you're planning to support script tags inside HTML. I hope you give a nice priority to that.
@julio: don't pay attention to Gosling, he's getting old. -
Thank You Steve!
Have you considered subimitting this for inclusion to emacs? I don't know how google would treat the copyright assignment or if they would allow it at all. I think it would be really great if this came stock with emacs! -
Hey, interesting post.
On parsing languages in emacs, I wonder if you also considered handing the parse off to an external process? My language mode helper, flyparse-mode, works this way (similiar to what jrockway describes above, actually):
* On idle, if the current buffer is dirty, it gets written to a file.
* An external ANTLR parser is invoked on that file. The output of the parser, a sexp-encoded AST, is written to another temporary file.
* I use emacs's (load .. ) to load the AST (this is the fastest way I could figure how to get it out of the parser, into emacs).
* Once the AST is in emacs, language mode helper functions can query it for useful information. The tree includes buffer-offset information so cursor positions can easily be translated to logical positions.
* The buffer-offsets are stored as relative offsets, on the nodes of the tree, so that between parses the tree is easily (and inexpensively) kept up-to-date with the current state of the buffer.
This all works well in practice, with no noticeable delays. I haven't tried to use it for syntax highlighting, though. -
Concerning keeping parsing and highlighting separate, and doing parsing asynchronously, you might find CodeMirror's approach interesting.
-
Two things:
1. I would really like it to highlight some minor errors like those JSLint detects, ie. missing the end semicolon in "var x = function() { doSomeThing(); };" Things like that break the browser after you concatenate all your javascripts together and minify it.
2. The indentation works badly in one particular case, when mixing hashes and functions etc. Try to construct an Ajax.Request with a hash of options, and defining an onComplete handler inside of the hash - you will end up with the whole hash indented far to the right, where it would be much more reasonable for it to be indented only one level more than the base "new Ajax.Request" statement. -
Great blog Steve, I have been reading all your post for the last few years this past month.
I am not programmer but a trader. I was a cs/ee geek who loves LISP.
I always install emacs at the workplace to the chagrin of my colleagues.
Keep up the informative posts. If i get a chance to learn some javascript for kicks i'll give your mode a try.
regards -
Nice.
I loaded it up this morning with a javascript file swimming in regexs. Every other extension I threw at it would barf.
So writing a mode which actually worked was at the bottom of my todo list. Until now.
Maybe you should consider hosting it with git (or mercurial). Getting your users hacking on it may speed up development. -
Hi Steve!
I just skipped to the bottom here when you started talking about continuations and the like. Writing recursive-descent parsers that use continuations is not actually that much work if you have a language w/ higher-order functions and closures -- the trick is to use a combinator library for parsing rather than writing it out by hand.
The really sweet thing about this approach is that you add features (e.g., incremental parsing) to your parser by adding them to your combinators: the parser definition itself remains the same! -
You don't have to restart the parser if it hasn't reached where you're changing things. Heck, you don't even have to stop the parser until it reaches that point.
-
Stevey, that you are using an obsolete Latin-1 encoding is not my problem ;-) Guess you're on windows, right? Calling myself Karl Landstrom may still be the only safe alternative... but it feels so 20th century.
-
I second Steve Atkinson's git suggestion. I'd love to be able to clone this and hack on it.
-
You've got an alert in your page that prints "you fixed me big boy". thought you might like to know about
Firebug finds it in your scribat around this:
this.i[uid] = i; this.q(); alert('you fixed me big boy')},
Hope this helps -
Thanks Steve! This is very useful.
I currently use a mishmash of ECB (Emacs Code Browser), Senator and the Javascript mode that comes out of the box with Aquaemacs to emulate an IDE, and it works pretty well. Your mode is definitely more interesting than the default but for some reasons doesn't play nicely with ECB. For example it doesn't show a list of variables in the ECB's Methods window and shows all the methods as collapsed by default. Is integration with ECB something you'll be looking at at all? -
Hey! Nice. As a Swede, I well sympathize with Karl Landström. For some reason, that bastardization happens every now and then.
Anyway, his URL is http://www.brgeight.se/.
Cheers. -
Since you brought up prototype.js early in the post -- are there plans to add bla = function() {...} parsing to IM-JavaScript-IDE scanning? Otherwise a 2000+ lines of Prototype seem to define only 2 functions ($H and $).
-
Wouldn't it be better to use the default emacs highlight faces as far as possible? One of the many things I like about emacs is consistency and having the same highlighting scheme applied to all languages is nice. (and less variables to change if I want to adjust it)
-
+1 for distributed version control; I would love to clone your repo.
I'm quite curious as to how you're handling unit tests. I've written a handful of unit testing frameworks in elisp (for my own sadly deprecated rhtml-mode), and the language seems to lend itself to a very different testing approach from your classic xUnit style.
I also haven't used any of my frameworks on large elisp projects, so I'm sure making it useful on that level involves its own set of challenges. -
Hmph... you keep pimping for ECMAScript. I came of age before AJAX and Web 2.0 and as such have been prejudiced against that language. I may to look into it, especially if there's some decent elisp for it.
-
Doesn't seem to work with XEmacs. Is XEmacs dying? Should I just give in and switch back to GNU Emacs? After 15 years?
-
It would probably be better to continue these discussions in the Wiki - http://code.google.com/p/js2-mode/w/list
In any case...
I uploaded a new version today with some fixes and new features based on all your comments so far, here and in the Wiki. Nothing big, but progress is nice.
XEmacs: it's not dead, but it appears to be dying. GNU Emacs (as of version 22 and even more so with 23) has essentially caught up with XEmacs and surpassed it in some ways. There are minor differences here and there, but it's worth switching.
Supporting XEmacs is hard for GUI stuff because it diverges dramatically from FSF Emacs in handling for input events, keystrokes, fonts, colors, widgets and other UI-related stuff. At this point it would be best for Emacs in general if XEmacs users could wean themselves off it. (With permanent props to the XEmacs developers for pushing the envelope for so long.)
ceesaxp: yes, I'm working on better imenu support, including parsing down into idiomatic declarations such as those found in prototype.js. look for it in an upcoming release.
andy freeman: emacs has no threads, so all I can do is pause and check for user input occasionally. It tells me if there is any input pending, but NOT what kind of input it is, so I don't have enough info to know whether to stop parsing. If someone knows the Emacs input system well enough to tell me how to look into the input queue, I'll make use of it.
Still looking into JQuery, mmm-mode, ECB, etc.
As for Semantic, although I think it's a great idea in principle, I find it to be one of the most annoying packages on the planet. It's complicated to install, isn't bundled with its dependencies, leaves crap everywhere in your filesystem, isn't smart enough to know it's trying to write to read-only filesystems, and so on ad infinitum. I needed a full Ecma-compliant parser for the bigger project anyway, so I figured I might as well use it for the IDE. -
Re: Semantic is annoying
If you were frustrated by CEDET, it would have been nice if you had contributed to the mailing list.
As far as Javascript support is concerned, you are right. I don't include the mode for it. CEDET doesn't include major modes, and I only stick in stuff folks explicitly agree to.
As for the other stuff, the CVS version probably fixes all that stuff. I have limited testing options, so I'm dependent on others to let me know if different platforms don't build or install correctly. -
There's an interesting paper here on Pretty Printing that might be of some interest.
A Prettier Printer - Philip Wadler -
This is interesting because I've had several recent encounters with the highly scary cc-mode.el. So far it's the most viable parsing option I've found. I'm using it in conjunction with ectags to give myself a half - decent shot of extracting useful semantic information from a large C++ project.
The main problem I'm having is resolving scope: when mutiple tags with the same scope exist in the project such as Fooozle::GetWidget, Fortle::GetWidget and Zarkon::GetWidget, and my cursor hovers over a GetWidget, which GetWidget am I looking at?
Amazingly cc-mode can actually help with this, and with some tweaking it appears it can get it right (mostly by correctly identifying the enclosing scope), although I've found no way to get it to treat Thing::Thong as a single identifier, so I'm having to write a lashup out of the functions cc-mode.el gives me.
When it works, though, it should kick etags out of the park, as I've found exuberant ctags simply to be the best practical parser out there, despite it needing tweaking.
It's not incremental though :(
Sorry to go on, but you are one of the few people on the planet who can truly appreciate a cc-mode war story ;-) -
I feel obligated to note that, given a little knowledge of UTF-8, you could have figured out the penultimate character in the name "Karl Landström". I'll dump a bit of it here in the following paragraphs.
First, take the last name as displayed and get the character codes:
javascript:"Landstr%C3%83%C2%B6m".split("").map(function(v){return%20v.charCodeAt(0);})
From that I get:
76,97,110,100,115,116,114,195,182,109
The two that correspond to our mystery character(s) are 195,182 by pulling off the "Landstr" and the "m".
Since all the cool kids use UTF-8 because it's reasonably efficient and doesn't half-pretend to be a fixed-width encoding like UTF-16 does (let's just ignore all that non-BMP stuff outside the first 65536 characters, eh? NOT), it's probably UTF-8 (also because that's the default across Linux distros, and this is Emacs). (There's also the not-so-little matter of the rest of the name being ASCII, which also might rule out UTF-16 unless you translated when doing the blog post, but UTF-16 seems unlikely for the previous reason anyway.)
In UTF-8, every character from U+0000 to U+007F is just the equivalent byte, like ASCII -- easy to deal with. Everything above that has the high bit set in a first byte and then follows with some number of trailing bytes, noted by the value of the first byte. In binary, the two values are:
0b11000011 0b10110110
The number of 1s after the leading 1 in the first byte say how many further bytes to read, so we only have one more. In the leading byte, bits indicating the character follow the first 1 and precede the first 0, so we have 00011. Trailing bytes start with 10 and are then followed by bits indicating the character, so we have 110110. Concatenating, we get:
0b00011110110
In decimal this is 246; in hexadecimal it's 0xF6. This is the Unicode code point U+00F6, which when I plug into JavaScript gets me:
javascript:"\u00f6"
...returning an o with an umlaut. For me this was the intuitively obvious choice for a character to fit there, if one were to be there, so I declare victory and assume the name is:
Karl Landström
Unicode and UTF-8 are awesome; if only those fool Windows and OS X people hadn't deigned to anoint UTF-16 as their platforms' wide character encoding we might all be happily using UTF-8 today. -
Fantastic job. Thanks Stevey. I haven't been coding js but will try your js mode next time.
Regarding your reply to Karl Landström:
«Unicode and UTF-8 are awesome; if only those fool Windows and OS X people hadn't deigned to anoint UTF-16 as their platforms' wide character encoding we might all be happily using UTF-8 today.»
Not sure what you mean exactly... but as far as i know, OS X espouses utf-8, not utf-16. (so, in your phrasing, Mac OS X has actually not deigned to anoint UTF-16, but Windows has. (and, not quite sure “deigned” is the proper word here, since you seem to regard utf-8 is superior. So, Windows has not deigned to utf-8. (further, the use of deign and anoint in “deigned to anoint ...” is contradictory)))
I lookup on wp ( http://en.wikipedia.org/wiki/Utf-8#Mac_OS_X ) and it seems verify that mac os x is utf-8.
Also, regarding the choice of utf-16 and utf-8... if your text is mostly euro langs, utf-8 is more efficient in storage. But if you write in Chinese, utf-8 incleases the storage by a factor of 1.5 or so, in comparison to utf-16.
The reason that mac os x chooses utf-8, from my guess, is largely compability with the unixes. i.e. unix tools are almost all utf-8. -
Thanks Steve, you rock! Now I don't have to quit emacs to do javascript editing. Great Work!
-
Turning off comments since I started to see some spam come in, and it's tedious to clean up.
Please use the Wiki at http://code.google.com/p/js2-mode for further discussion. I've been releasing updates with major bug fixes, enhancements and usability tweaks about once a week. -
I suspect that Karl Landström's parents were typographers, not mathematicians.
-
There is also a js+rhino rewrite of jsdoc called jsdoc-toolkit. We recently started using the version 2 beta, which seems pretty good.
Tom -
Amazing work! Thank you thank you thank you!
I'm using a modified version of Karl's mode for a long time but I'm not too happy with indentation. Spent all day yesterday trying to figure out how CC Mode indentation works, then asked a question on their mailing list and someone pointed me to your page.
I still think that the best idea for indentation is to use CC Mode... It works pretty damn well and it's heavily customizable. As far as I can see, it fails only in one case: when you write literal objects/arrays or anonymous functions in an argument list. I think this can be fixed and I'll continue to look into it.
Anyway, thanks again for this great work and I'm eagerly waiting for the "bigger project"! :-D -
Amazing, excellent, thank you very much! :-)
-
Hello, Steve!
Could you please tell how do you test your elisp code? I didn't find any tests in svn repository.. Do you use some library, or maybe assert from 'cl library? -
@Tomtt, @Phil, @Dmitry etc.
For largish emacs code I've been writing, I have been using elk-test.el for unit testing.
It's not fantastic nor is my use of it, but it's probably a step in the right direction. -
Wow, great work! This fixes all the gripes I have with my emacs setup for javascripts and have been meaning to deal with for ages.
-
Why does this interefere with yassnippet?
Yas is incredibly useful, and I'd like to be able to use it alongside js2mode.
Trouble is, js2-mode grabs the tab for changing indentation, do I can't expand my snippets while editing js.
Interestingly, I am doing something similar for Perl (using PPI, a Perl-parser written in Perl). Anyway, to avoid porting PPI to Lisp, I do the highlighting in an external process and send the offsets and face names back to emacs. This is nice because it doesn't block emacs at all, although I admit it's sad to see your syntax highlighter using 100% CPU while you're typing in your code. (Right now, the naive interface doesn't cancel requests when a new one comes in, so you end up highlighting "f" "fo" "foo", ... when you type "foo". That can be worked around though.)
Anyway, it would be interesting to see the your javascript syntax highlighting happen in an external process.
*sigh* I miss the days when font-lock could efficiently handle all the popular languages :)
My code is at:
http://git.jrock.us/?p=Server-Stylish.git;a=summary
— jrockway · 4:36 AM, March 31, 2008
Thanks all - this is why I released it early; I wanted a new stream of bug reports and feature requests.
Baishampayan - I'll look at JQuery.
AriT93 - I think it's still a long way (stability-wise, and integration-wise) from inclusion in Emacs. Maybe in a year or two.
sztywny - I'll add in a missing-semicolon warning asap.
barry kelly - it can't be as bad as you think (or perhaps as bad as I think), since my parser has the same responsiveness as the ones in Eclipse and IntelliJ, and they're written in Java. They don't do it synchronously either.
Karl - sorry, and I'll make a note of the proper spelling. :) And thanks for the indenter!
James Gosling - Java was a great idea fifteen years ago...
— Steve Yegge · 11:33 AM, March 31, 2008
A note about the ECB comment. ECB uses either the CEDET/Semantic javascript parser, or the imenu parser.
ECB support could be handled by changing the configuration for contrib/wisent-javascript.el in CEDET to point at the new js2 mode hook. This may not work if the syntax table is too different though.
Alternately, this javascript mode could probably generate a very nice set of CEDET/Semantic tags which could be used instead, thus enabling local context parsing and smart completion via the semantic APIS.
Also, a note on the incremental parser discussion in the blog post. Semantic handles incremental parsing by chopping up the buffer under overlays in the first pass. Incremental parses group individual changes under the overlays, reparses those overalys, and splices the results back to gether. It handles most cases well, and is fast even in big buffers.
— Eric Ludlam · 12:31 PM, March 31, 2008
Not to be snarky, but 5000 lines/sec... what are you running that on, a 90MHz Pentium? Or is emacs elisp really that bad for performance? (I guess it must be.)
That deeply sucks. For one thing, you don't need to create a full AST for syntax highlighting. For basic lexical highlighting, a lexer that can track its line starting states will do; for semantic markup, such as highlighting methods, classes, etc., and providing some kind of code insight / intellisense, you do need to parse declarations, but you can largely skip tokens between '{' and '}' at the top level.
To be frank, though, if elisp could only do 5000 lines/sec for highlighting, I think I'd rather write my own editor & macro system. It would be less depressing.
— Barry Kelly · 6:51 AM, March 31, 2008