First, let me clear up a few misconceptions from the previous answers. One of them said “Try writing an operating system with Lisp”, as though this would be somehow harder. In fact, one of the nicest operating systems ever done was on “The Lisp Machines” (in Zeta-Lisp), the hardware and software following the lead of “The Parc Machines” and Smalltalk — and we in turn had been very influenced by the Lisp model of programming and implementation. (These operating systems in both Smalltalk and Lisp were both better (claim) and easier to write (simpler to demonstrate) than the standard ones of today.)
Another interesting answer assumed that “the test of time” is somehow a cosmic optimization. But as every biologist knows, Darwinian processes “find fits” to an environment, and if the environment is lacking, then the fits will be lacking. Similarly, if most computer people lack understanding and knowledge, then what they will select will also be lacking. There is abundant evidence today that this is just what has happened.
But neither of these has anything to do with my praise of Lisp (and I did explain what I meant in more detail in “The Early History of Smalltalk”).
To start with an analogy, let’s notice that a person who has learned calculus fluently can in many areas out-think the greatest geniuses in history. Scientists after Newton were qualitatively more able than before, etc. My slogan for this is “Point of view is worth 80 IQ points” (you can use “context” or “perspective” etc.). A poor one might subtract 80 IQ points! (See above). A new more powerful one makes some thinking possible that was too difficult before.
One of our many problems with thinking is “cognitive load”: the number of things we can pay attention to at once. The cliche is 7±2, but for many things it is even less. We make progress by making those few things be more powerful.
This is one of the reasons mathematicians like compact notation. The downside is the extra layers of abstraction and new cryptic things to learn — this is the practice part of violin playing — but once you can do this, what you can think about at once has been vastly magnified. There were 20 Maxwell’s Equations in their original form (in terms of partial differentials and cartesian coordinates). Today the four equations we can think about all at once are primarily due to their reformulation by Heaviside to emphasize what is really important about them (and what is likely to be problematic — e.g. the electric and magnetic fields should probably be symmetric with respect to movement, etc).
Modern science is about experiencing phenomena and devising models whose relationships with the phenomena can be “negotiated”. The “negotiation” is necessary because what’s inside our heads, and our representations systems etc have no necessary connection to “what’s out there?”.
Taking this point of view, we can see there can be a “bridge science” and “bridge scientists” because engineers build bridges and these furnish phenomena for scientists to make models of.
Similarly, there can be a “computer science” and “computer scientists” because engineers build hardware and software and these furnish phenomena for scientists to make models of. (In fact, this was a large part of what was meant by “computer science” in the early 60s — and it was an aspiration — still is — not an accomplished fact).
The story behind Lisp is fun (you can read John McCarthy’s account in the first History of Programming Languages). One of the motivations was that he wanted something like “Mathematical Physics” — he called it a “Mathematical Theory of Computation”. Another was that he needed a very general kind of language to make a user interface AI — called “The Advice Taker” — that he had thought up in the late 50s.
He could program — most programs were then in machine code, Fortran existed, and there was a language that had linked lists.
John made something that could do what any programming language could do (relatively easy), but did it in such a way so that it could express the essence of what it was about (this was the math part or the meta part or the modern Maxwell’s Equations part, however you might like to think of it). He partly did this — he says — to show that this way to do things was “neater than a Turing Machine”.
Another observation about this is that the “slope” from the simplest machine structures to the highest level language was the steepest ever — meaning that the journey from recognizable hardware to cosmic expression is a rocket jump!
As is often the case — especially in engineering — a great scientific model is often superior to what exists, and can lead to much better artifacts. This was certainly true here. Steve Russell (later famous for being the main inventor and programmer of “SpaceWar”) looked at what John had done, and said: “That’s a program. If I coded it up we’d have a running version”. As John remarked: “He did, and we did”!
The result was “unlimited programming in an eyeful” (the bottom half of page 13 in the Lisp 1.5 manual). The key was not so much “Lisp” but the kinds of thinking that this kind of representational approach allowed and opened up regarding all kinds of programming language schemes.
A fun thing about it this is that once you’ve grokked it, you can think right away of better programming languages than Lisp, and you can think right away of better ways to write the meta descriptions than John did. This is the “POV = 80 IQ points” part.
But this is like saying that once you’ve seen Newton, it becomes possible to do electrodynamics and relativity. The biggest feat in science was Newton’s!
This is why “Lisp is the greatest!”