Atlas · Details
The Emacs Problem
AI Notes
Charles G. wrote Steve an email observing that Lisp doesn't really seem like the right language for text processing — and wondered if someone would one day rewrite Emacs with Ruby as the embedded interpreter. Steve answered in public. The reply walks the long way round: text processing is regular expressions, sure, but also tree-walking and transformation, and text data wants to be tree-structured, which is why everyone is sliding toward XML whether they want to or not. XML is a tree-structured language whose author knew nothing about programming languages, so every executable-XML framework — Ant, Jelly, Cocoon — drifts toward Turing-completeness and reinvents Lisp badly along the way. Code is data, data is code, and configuration files, logs, web pages, and mini-languages are all going to converge there no matter how long the journey takes.
The last third turns to why Emacs isn't advancing despite all of that. Contributing to it is a Cathedral problem (FSF paperwork, RMS's standards, the long history of forks like XEmacs that didn't heal), the rendering engine is too primitive to ever do PostScript or a real browser, and the next generation of programmers has been pulled away by Eclipse and IntelliJ. It opens a question Steve would return to for the next twenty years: how do you move a beloved system forward when its community is structured to hold it still?
Related listings
-
2005
Effective Emacs
Same year — the optimistic practitioner's manual that this essay's architecture-critique sits beside as the sobering counterweight.
-
2005
My .emacs file
Same year — the catalog of what Emacs already does well, written by the same person describing here what it can never do without a near-complete rewrite.
-
2006
Lisp is Not an Acceptable Lisp
A year later — the same impasse looked at from the other direction: the Lisp world's libraries and tooling are the dual of the Emacs problem.
Where it was argued
- Hacker News Feb 2014
From the peanut gallery
Read the rest of the thread · 14 more
-
Talking of Lisp, XML, and data-is-code-is-data, have you seen this?
http://agentsheets.com/lisp/XMLisp/
Looks like an interesting alternative to SAX/DOM approaches. It uses the reflection-layer of the Common Lisp Object System to transform XML elements to and from Lisp objects on the fly.
I suspect the Lisp hardcore would say 'Pah! Just use sexprs — that's what they were intended for'. But this still looks like an interesting glue layer.
-
And talking of using Common Lisp to rewrite Emacs, have you taken a look at Hemlock, which comes with CMUCL?
It's still not Emacs. Its incompleteness does a fair job of proving your point about how hard it would be to completely rebuild all of the Emacs utilities in CL, but it *is* kinda nifty.
-
We used XML/XPath/XSLT for a project to scrape prices from vendor web sites a couple years ago. It was fun in a, "I can't believe we're making this Elephant dance" kind of way. I can't imagine actually wanting to use that technology for something I wasn't planning to throw away, however.
XSLT isn't a real language (or it wasn't then). Xalan's xslt extensions allowed us to embed Javascript, which we needed for ... regular expressions! XPath worked great for finding the nodes we wanted, but then we wanted to find specific text in the nodes and do some simple cleanup. So, now we had XSLT with Javascript, being executed by a Java program running Xalan's XML libaries. I just went back and read through the code again, and its just as ugly as I remembered it being. Besides the fact that it's using Xalan extensions, so it isn't really portable anymore.
I think you've hit on the fundamental flaw. They are trying to build another lisp, but this one is uglier, way more verbose and not the slightest bit symmetric. I mean, they have good intentions: XSLT is just executable XML, but there isn't any way to extend it, short of vendor specific extensions. I really like the idea of having configuration files in Lisp. That would make parsing them almost trivial, although I think you would still want standard "Perl'ish" regular expressions for the verbose bits of data at the leaves of the tree. That would be easy to add in a real language though! Not to mention the ability, as you say, to just define functions with the same names as your nodes. You could even use macros to parameterize them for different sorts of transformations.
Sounds great, when do we start?
-
> Sounds great, when do we start?
Just as sooooon as I find a Lisp that doesn't cause nausea, diarrhea and severe stomach cramps.
Not entirely clear that such a thing exists, but I'm working on it. I keep having to wait for the symptoms to disappear before trying the next one.
Seriously, though: before trying anything with Lisp, I have a big checklist of criteria that it needs to pass. (Or at least provide some sort of "out" so you can fix it yourself, e.g. by bridging to C libraries). Stuff like decent concurrency support, asynchronous I/O, good tools and cross-platform support, etc. etc.
I really wish someone else had done this evaluation already. I only have so much free time for it. But I've found a few useful links; Paul Costanza's Highly Opinionated Guide to Lisp is one (in no small part because of all the links it has at the bottom). And Cliki is kind of useful.
But I haven't yet found an evaluation of any Lisp's suitability for what we think of as "production" work. Feel free to help! Maybe we need to start a Wiki on it. Call it Blub, though, so nobody gets scared. :-)
Short answer: not any time soon, I think. Java and C++ are still the king and queen, respectively.
(Note: C++ programmers who read that are thinking "hey, why do we have to be the queen?". That's why. :-)
-
Brian: Yeah, Hemlock looks interesting. Actually, do so Guile Emacs and JEmacs.
JEmacs could be really cool if it ever got finished. It's mostly a proof-of-concept, and Per Bothner isn't working actively on it anymore. He wants someone to take over the development on it. The code is clean but complex — Kawa (unlike pretty much all other JVM "scripting" languages) goes to great lengths to ensure static typing so it can be efficiently compiled to bytecode. And Kawa actually supports at least five different languages in its framework. So it'd be a significant undertaking to ramp up on the code and start pushing it forward. Not that I don't think about trying! Having a reasonable Emacs that you can run in the same JVM as the Java app you're developing would be nothing short of incredible.
I'll download Hemlock and actually try it out. All in all, I generally prefer Scheme to Common Lisp, though I wish for all of CL's library functions. I think Per is actually planning on adding namespaces to Kawa, and offering all the Common Lisp stuff to your Scheme code by explicitly prefixing it, e.g.:
is equivalent to the Scheme expression:
Being able to mix and match Scheme libraries, CL libraries, elisp libraries, and Java APIs together seamlessly (which he's working on) will make Kawa a serious contender in the Lisp-implementations lineup at some point. By the time Arc comes out, he'll be able to support that pretty well too.
I'm stalling, though. I really need to download CMUCL and start hacking with it. All my fun hacking been elisp or scheme lately.
-
Tree Regular Expressions (in SCSH Scheme)
XML is great stuff if you're working on text with a high content/markup ratio, like written documents. Take a look at the source of this blog page if you want an example. Vast swathes of text with a few P and EM elements. The problems occur when the text/markup ratio approaches zero.
Interview question for people who claim major XML chops: Why does XML have both attributes and elements?
-
I keep on hearing about how great lisp is, and lispers rant on and on about it, but unfortunately all the good lisp examples are badly out of date, or in some horrible state (emacs vs xemacs and the personal politics of RMS), or apparently almost completely unusable according to your previous entries on languages. So what is a programmer to do? If everyone evolves to lisp, the argument is just go straight to lisp, but if the language is so great, why isn't more evangelism being done on it?
Most statements involving lisp tend to be what the SEC would call 'forward looking statements' or what i'll call 'niche statements'. That is people describe what COULD be or describe what works for a small number of people (why it never grows beyond that is never really talked about).
in the mean time on my Mac, I have the best UI development system ever conceived and it was built using Objective C - it's also shaping up to be one of the most advanced UI programming environments available period (CoreImage - screw 8 bit integers for each RGBA, how about FLOATS for each component?!). In the mean time the best face on lisp is ... emacs?!
I liked your entry on languages, but I was left with the distinct feeling that you don't like any programming language right now, any follow up there?
-
The fundamental problem appears to be a chicken-and-egg problem, or just a game of chicken.
Nobody wants to use a language unless it's rock-solid: stable, fast, well-documented, well-specified, bug free (at least in complying with its specification), portable, etc. Oh, and you already have to know it, which limits the field a bit. Most people don't want to learn new things; they'd rather build new things, even if what they're building is something that already exists. Easier than learning the old thing.
I personally dislike most languages so much that I would hesitate to use them unless absolutely forced into it. (Even then, it would just be ho-hum, and not all THAT bad. Getting a dumb language to do what you want can be a fun challenge in itself — that's got to be part of why the existing popular languages are so popular.)
The game-of-chicken is that a language can't actually become rock-solid without a big community. The bigger the community, the more solid it gets. Hence, the most popular languages are the most solid: they're popular because they're solid, and they're solid because they're popular. It feeds on itself, reinforcing the desire to use the existing languages, no matter how awkward the languages are at saying things.
I care a lot about how easy or hard it is to say things in a programming language, because I've decided that I hate giant systems. They suck. The gianter a system, the more bugs it'll have, the harder it'll be to learn, the worse its availability will be, and the more people you'll need to hire — to make the system even more giant than it already is.
Higher-level languages provide mechanisms that allow you to say certain common things (such as: "give me all the elements of this vector that have a customer whose name starts with an S", although there are many other examples than just data-structure queries) much more conveniently than you can in C++ or Java. Not that you can't say them in C++ or Java — it just takes longer. C++ sometimes lets you say those things better than Java — but then screws you over with having to hand-manage memory and do all this other hooey, and on the balance, it's worse. Much worse.
Over time, if you use a language that doesn't let you say things compactly — in other words, a language that lets your refactoring make your code base smaller, then your system will grow giant, and then lots of problems will happen: longer builds, slower ramp-up, slower innovation, more bugs, lower availability, etc. Many developers (junior and senior) don't realize that it doesn't really have to be this way; it's one of the reasons I write this blog in the first place.
My position at the moment is that I'd like to see teams use the highest-level language that they can just barely tolerate. If everyone switched from C++ to Java, I would be overjoyed, and in no small part because stupid-ass porting projects like RHEL3 would disappear, freeing up engineers to do, you know, engineering, rather than a bunch of frigging porting work. Like, duh. Can't anyone at this company see how much pain C++ is causing us? No, because you have no accountability for porting your systems. Let dev-services do it... as if that'll work. There are half a dozen other reasons that are just as valid as the porting one. C++ is killing us. It's a virus that, once it's entered your company, will expand until it pushes everything else out, and you will become paralyzed. You all know this at some level, but you let it happen anyway.
Java just may be the only language out there today that's higher-level than C++ and still suitable for building a really massive service like CMS. That's because it's rock-solid (it really is, amazingly so nowadays), which stems from its popularity. I trash on J2EE a lot, because I'm squarely in the "Better, Faster, Lighter Java" camp, which is a subset of the "use small, reusable tools, not giant-ass frameworks". That's a long discussion that you can have with Peter D., who agrees and feels even more strongly about it than I do. :-)
But Java in general is a damn good language platform. Could be better, sure, but there's a lot to be said for it. And two or three very promising developments in Java make it even more compelling: Java 1.5 (which adds some really nice features), AspectJ (which adds some very powerful features and now seems mature enough for production, if approached carefully), and the JVM scripting languages, which if used cautiously, can greatly improve your ability to do things like unit testing, builds, debugging, configuration, scripting, prototyping, and other *auxiliary* work that we all do all the time.
If everyone switched to Java here, I'd be one happy camper. I'd say the same thing about Perl, except that Perl is so excruciatingly detestable (and broken in many ways) that I couldn't bear to recommend it. But I'd secretly still be pretty happy about Perl over C++.
Moving up the language power ladder, Python and Ruby are great for many tasks, and I think we could start using them in moderation. I picked Ruby somewhat arbitrarily, since it seems a little cleaner, but they're both good. Python is more "solid" by the definitions I gave above. Everything else — the ML family, Haskell, Smalltalk, and a bunch of others — none of them seem like they're solid enough for any production work at all here.
Python and Ruby are solid enough to use for auxiliary coding, but I'm not convinced either of them is solid enough (by which, again, I mean mature, stable, fast, etc.) to write (say) OMS or CMS. Java is solid enough, although you'll still have your share of serious engineering headaches that have little to do with the language — distribution, scaling, etc. Unless maybe you use a language like Erlang that makes these problems part of the language, but I remain a bit skeptical there.
So Java. Java is what I'd recommend. And Ruby or Python for anything you can get away with building using them. Nothing higher up the power curve is suitable today...
...except for Lisp. Lisp is the one (possible) exception. It appears pretty solid, and it happens to be very near (or at) the top of the power continuum. I'm holding out hope that some version of Lisp will be solid enough to use here. I'm looking at them. These things take time. What I want could best be described as "Common Scheme". Failing that, Common Lisp seems like the main contender.
Our interviewing process hires C++ programmers; that's what it's been carefully tuned to do (whether we deliberately set out to do that or not), and most of them don't know Lisp, let alone anything between Lisp and C++. The problem with Lisp, of course, is that it's so far up there that most programmers think of it as a complete joke, not even worth mentioning in the same breath as "work that one gets paid for".
And you're right, Emacs is the only application most people associate with Lisp. However, you don't get to see the production code from most companies. There are a lot of companies using Lisp, but it turns out that rather than trumpeting this, they hide the fact very carefully. They view it as a secret strategic advantage, and they don't want anyone (except potential hires) to know they're using it. Also, it doesn't do a thing for your product marketing if you say it's written in Lisp — that's more likely to scare people away than get them to use it. So Lisp actually appears to be a well-kept secret in the industry.
At least it appears to be. I haven't written anything big in it, so I can only speak from a sort of investigative standpoint. It could be a year, or even five years, before I really know. In the meantime, switching to Java is an absolutely outstanding thing for people to do. There are no doubts about it, no mysteries — it's as rock-solid as you need it to be. And it's not even scary.
-
"Common Scheme" being an industrial-strength, portable Scheme implementation with widespread support? Already satirized (and by Guy Steele no less):
http://zurich.ai.mit.edu/pipermail/rrrs-authors/1998-May/002343.html
The thread is an argument between the minimalist and maximalist groups of Scheme language designers.
-
In this entry you present us with a false dilemma. Either we write our configuration files in ad-hoc languages that eventually become turing-complete monstrosities, or we write them in s-expressions. Either our logfiles are XML and we are consigned to the world of XSLT/XPath, or they are lisp programs that know how to execute themselves.
I would like to present a third alternative:
"date" => "2005-02-21T18:57:39",
"millis" => 1109041059800,
"sequence" => 1,
"logger" => nil,
"level" => :SEVERE,
"class" => "java.util.logging.LogManager$RootLogger",
"method" => "log"
"thread" => 10,
"message" => "A very very bad thing has happened!",
"exception" => {
"message" => "java.lang.Exception",
"frame" => {
"class" => "logtest",
"method" => :main,
"line" => 30
LISP isn't the only language that has hierarchical data structures.
No, a hash table isn't executable and can't transform itself like a LISP program could, but it's not clear to me why it should. A stack trace is not inherently an executable thing, any more than my license plate number is. No, I can't subclass a hash table, but I can subclass this:
So: yeah, hopefully we can do better than XML/XSLT. But what makes LISP the answer? And what does it really buy you when your data, which by itself is not meaningfully executable, is expressed in a syntax that could be executable?
-
The sections "Starting with Syntax" and "Redundancy Is Good" argue that XML is better for document processing—-which is true.
The "Family Matters" section argues that, because (for example) XSLT is a domain-specific language for processing XML documents, it's a better choice than a general language for processing s-expressions. But each of the listed technologies is getting expanded and expanded because they don't support "real" programming language concepts:
"Many users have requested the ability to return a conditional value based on a boolean expression. XPath 2.0 MUST provide a conditional expression..."
"As part of the XSLT 1.1 work done on extension functions, a proposal to author XSLT extension functions in XSLT itself was deferred for reconsideration in XSLT 2.0. This would allow the functions in an extension namespace to be implemented in "pure" XSLT, without resulting to external programming languages."
"If a ::marker pseudo-element has its 'content' property set to normal, the following algorithm should be used to generate the computed value of the property."
"It is the ideal time right now for the W3C XPath 2.0 working group to make the decision to provide the necessary support for higher-order functions as part of the standard XPath 2.0 specification. In case this golden opportunity is missed, then generic templates and libraries will be used in the years to come."
-
I think Derek pegged it better than anyone when he commented earlier that XML is better for high content/markup ratio, and Lisp is better when it's mostly markup. I hadn't thought of it this way before, but it immediately rang true.
By way of background, you should be aware that Paul Prescod is a person who is trying to let the world know, in no uncertain terms, that Python people can be even snottier than Lisp people. He's one of the handful of folks that come to mind when I describe the Python community as "frosty".
So of course he's going to try very hard to dissociate XML with s-expressions, because if XML is really s-expressions (and you fail to take Derek's observation into account), then you could easily draw the conclusion that Lisp beats Python for XML processing, and Paul would very much like for nobody to draw that conclusion.
The XSLT-article guy is just whacked out on acid. He seems to think that by zooming out to a satellite's-eye view in his XSLT examples, he will give the impression that XSLT is as compact as Haskell. I can't help but feel, looking at all those tiny colored blobs, that I'm about to fall twenty thousand feet to my death, impaled on angle brackets, and that the colors are the spatters of everyone else who's fallen on them so far — including blue-blooded XSLT aficionados.
-
Josh: if you only focus on logfiles and other similarly inert-seeming data clumps, then Ruby's fine, and the solution you suggest seems fine.
The only nitpick I'd make is that defining the log entries as a language-specific hash makes it more difficult to process the entries in some other language, whereas with XML and Lisp, both of them are syntactically relatively straightforward, and you can in fact convert trivially and lexically between the two of them. But it's mostly splitting hairs.
For more complex documents, e.g. web pages or word-processor documents, I might be inclined to go with a Lisp dialect, if I were designing it from scratch. But who knows. Ruby's very nice too. If only it had native-code compilers and preemptive multithreading and a macro system and all those other goodies Lisp has...
-
p.s. it's important to realize that you can fairly cleanly divide programming languages into "scripting languages" and "programming languages", and I really understand the distinction now. Scripting languages are all a bunch of miserable hacks: Perl, Python, Ruby, Groovy, Tcl, Rexx... you name it. They all start life with no formal grammar or parser, no formal bytecode or native-code generation on the backend, no lexical scoping, no formal semantics, no type system, nothing. And that's where most of them wind up. They may grow and evolve in the right directions, and they all eventually become pleasant to work with, after enough hacks piled on, but all of them are plagued with fundamental problems.
C, C++, Java, Objective-C, C#, Pascal, Lisp, and Scheme (to name a few) are all REAL languages, in all the senses I mentioned in the previous paragraph.
Notice that in "real" languages, you may or may not have good string-processing, or garbage collection, or OOP constructs, or first-class functions, or anything else. All those concerns are orthogonal to whether the language was built with a compiler framework in mind or not. And there's absolutely no reason someone shouldn't be able to create a "scripting language" that has a solid foundation.
What's the difference? Why do all those formal doo-hickeys matter?
Performance! Compiled languages are fast. Lisp is WAY faster than Ruby, over the long haul. Smokes it.
I care about other stuff besides performance, of course, and compilation gives you other benefits as well. But most of all, it gives me this feeling of security, knowing that the formal syntax and semantics are all well-specified. I think that's what I mean by "solid". (And despite the general ugliness of XML and friends, its formal specification is pretty good.)
Chris: thanks for the pointer to XMLisp! It looks really nifty. Actually, AgentSheets itself looks kind of cool. Backing the link up a level:
http://agentsheets.com/lisp/
I wonder if their product is all written in Lisp? There's no mention of this anywhere on their website — which I suppose is probably great for marketing.
How'd you hear about this? Just a Google search?
— Steve Yegge · February 22, 2005 11:48 PM
I had never heard the term "s-expression" before. I saw this article on Google though. Seems somewhat relevant.
— Joel H · February 25, 2005 10:54 PM