Haskell: Pure and Lazy, yet Functional
“Avoid Success at All Costs!”
Haskell is truly a unique language. It is very concise—almost mind-bogglingly so—and elegant—in the sense that pure mathematics is elegant. It is sort of hard to learn if you jump straight into it, because it is very foreign, even more so than Lisp or Scheme, for example. If you take Lisp and build a strong type system and add a little more syntax (and take out a lot of parentheses), you get a language called ML (or OCaml, one of the nicer variants of that language family). OCaml is very different from C/C++/Java/Perl, but Haskell is even more different from typical languages, for two additional reasons: lazy evaluation and purity.
Non-strict or Lazy Evaluation
In most cases (I/O with the user being a notable exception), you don’t specify the order of evaluation of your program! Instead the compiler generates code that “gets around to” evaluating your functions when the value of each function is needed. This is called lazy evaluation. “Lazy evaluation enables new forms of modularity; in particular, separating generation from selection.”  It took me a while to understand how this “evaluating of things only when needed” would provide more modularity in your program, but it does!
It allows you to divide the problem up into smaller conceptual units which get combined in an efficient way . For example you can write your code in this clean (seemingly inefficient) style, where each piece does something self-contained to an entire list or string: (a) read an entire 1GB file, (b) split the file into a list of words (c) make a new list that has the word “the” filtered out, (d) append the words back together into a string (e) write the whole string to a file.
The compiler, however, glues this all together as if you had written a single chunk of code that (a) reads single characters from the 1GB file until (b) you have a full word (c) throwing out the word “the” in the process, (d/e) and then writing each letter of the current word to the file. Notice how this “single chunk” style never reads in the entire file, only enough to look at a word or character at a time (or maybe 1k chunks if you have buffering set, for efficiency)! Also notice how an ordinary “single-chunk” style of programming would not allow you to split up each of the subtasks (a-e) into totally separate functions  that you can use elsewhere in your program: they are stuck directly inside your while or for loop! One of the only (and bad) ways to get around this in C++, for example, is with macros (Or write your own lazy streams/lazy strings library). This one is a bit subtle.
I have always loved laziness because it allows you to write a simple "normal" function that transforms other functions into totally different functions just by applying or composing it. Eager languages would require macros or explicit force/delay. Pure/lazy functions are much more macro- (and math-) like than eager languages allow for directly. Much of the time, a pure/lazy functional language doesn’t even need macros; clear concepts and their effecient implementation can expressed directly in the language.
minlist = head . sort
Note that minlist doesn’t sort the whole list (order N log(N)), instead it just floats out the smallest element in order N time.
In most programming languages you would have to make a new function with explicit storage (to track the current best minimum value) and a for loop. If you were only looking at pieces of the list (say, numbers from a very large file) you cannot write a subroutine to find this minimum element in the list (since the subroutine would require you to have an entire list passed as an argument), instead you must write the minimum element finding code directly (or with a macro) into every part of your code that needs it.
You can use iterators to solve this, and indeed this is why all these imperative/object-oriented languages have had to add iterators to the language: to allow you to write code that doesn’t evaluate everything too soon, to be more lazy. The cost of iterators (at least to me) is you actually have to change gears mentally and syntactically. You now have to (a) know the syntax (which varies from language to language) and (b) make some kind of class with some methods, write all these little bits of code, and of course, if you never reuse the code, it was a complete waste, you should have just coded it directly into your loop.
With laziness, in Haskell, by default, all your functions are perfectly reusable in any new situation without (a) more typing or more syntax and (b) without any forethought or distraction from the problem your were orginally trying to solve.
Haskell is even more weird than all these other languages because it is “purely functional”. That means you cannot mutate variables or introduce “side effects” anywhere in your program (short of I/O). This is good and bad. One of the major benefits is that you never have those bugs when some code changes something in place over here in your program and you don’t notice it until it breaks something way over there. In addition, the compiler is guaranteed certain things about each function and can perform certain optimizations at compile time. The bad part is that there are certain things that are more natural or more efficient just to “vary” variables in place, like the computer hardware does underneath.
If you write “a = 4” then you cannot later say, “a = 5”. If you think about it, the C/C++/Java/Perl code “a = a+1” would confuse any non-programmer who took any algebra. No number can be itself plus one! In this sense, Haskell “=” always mean equals: you can’t have a function that returns different values for the same input, no matter how hard you try. “Purity is more important than, and quite independent of, laziness,” according to one of the lead GHC (Glasgow Haskell Compiler) architects, Simon Peyton Jones,  below.
That’s not all
These two features and restrictions are combined with a bunch of other carefully thought out, powerful features that give you back more than what you gave up coming from other programming languages (Higher-order polymorphic types with type classes, algebraic data types, monads; pattern matching, expression based, declarative style programming; built in support for literate programming, clean, lightweight syntax, do-it-youself infix operators, sections; and much more...). In return, you can build more powerful, more elegant, easier to read programs that are much more easily extended and experimented with, prototypes that more quickly help you find the end design.
The hardest thing about learning Haskell is perspective. There is an old saying that goes something like "Any FORTRAN programmer worth his salt can write FORTRAN programs in any language." But one thing they got right when they designed the language is that it is so different that you can only program in it ‘in Haskell,” thinking in Haskell. Your program won’t work unless you understand Haskell. The old ways of writing programs (C/C++/Java/Perl/FORTRAN) don’t work in Haskell.
But that’s okay. The way a bunch of friends and I learned Haskell was exactly this road: Lisp/Scheme, then OCaml, then Haskell. This makes the brain-bending steps a lot more natural. In fact, I learned Python after I first learned Lisp (in a sophomore level CS class at Caltech), but I never really “got” Lisp until I relearned how to write better code in Python by reading David Mertz’ chapters on this.
A Functional Style of Programming
With Lisp or Scheme the perspective starts shifting with a functional style of programming where functions map data to data instead of "do this; do that; then do that"—it’s more mathematical. In addition, you can (frequently, not just as a clever gimmick) pass snippets of code as data, put functions in lists, and return new functions created at run-time . Instead of using while or for loops to express control structure, you use recursion and pass or return values to process data, instead of changing it in place. This seems inefficient but there are ways that the compiler can re-use registers and generate efficient loops instead of making expensive recursive calls that push (and possibly overflow) the stack.
With ML/OCaml, the additional shifted perspective that comes is understanding the "smart" type system and getting your program to compile. At first it’s kind of difficult, but then you start to understand what you’re doing wrong and it all comes together. The difficulty comes because the compiler is so "smart" that you don’t have to specify the types for any parts of your program (declaring variables, arguments to functions, return types) that the compiler can infer exactly for itself, from the little bits of information it picks up here and there. For example, it sees "(some expression) + (some other expression)". This means that (some expression) and (some other expression) have to be the same numeric type. Of course this gives you a lot of freedom, but you still should give types to your functions for documentation purposes.
Try Haskell Today, for Free!
Knowing that there is a more natural progression of programming languages wouldn’t stop me from recommending learning Haskell for any level of programmer. You will be richly rewarded, although it is sometimes a lot of work to get there! In addition, you will always be learning new ways of doing things, solving mental puzzles that are fun to wrap your brain around . At the same time, I hear Haskell is surprisingly good for learning to program (I think it is a surprisingly good language to relearn how to program in), although it has the nasty “side effect” of forcing you to see certain shortcomings of every programming language (including itself sometimes) you use or go back to. It’s so good it starts to cut both ways.
In summary, Simon Peyton Jones says, “A smallish, rather pointy-headed userbase makes Haskell nimble. Haskell has evolved rapidly and continues to do so. Motto: Avoid success at all costs.” Haskellers are so happy and comfortable with the power (and limitations in some areas) of “their” language that they are not worried about its success: It may not become popular but it’s a joy to use and it lets me program in ways I didn’t know even existed before!
 (This set of slides is a bad introduction to the language because it assumes you already know what he’s talking about. I put it here because I hate quoting things without giving a reference.)
 Unix hackers will recognize this as (1) an opportunity to use the Unix utilities “sed” and “tr”; (2) as well as an example of lazy pipes. Indeed Haskell is doing a sort of “interfunction” concurrency, but on the programming language level, instead of the process level!
 Sure you can use some function calls, but you still have to write the code to collect characters into words in the while loop (or make calls to very implementation-specific functions), and you can reuse that code directly elsewhere in your program, but it’s not general enough to be useful (see_if_it_makes_the_word_the(char,current_word). You can write a function from input file f1 and output file f2 but not a function from string to string because you need the whole (1GB) string in RAM to do anything—unless you pass the file handle around, which is I/O specific, and not a general purpose string-to-string function.
 Not as inefficient as it sounds. Implementation-wise, think of it as creating an instance of an object, passing the constructor a few variables to capture the state; the method with the captured state is the only part of the object we care about. (Using this implementation technique, the object is sometimes called a closure.)
 I remember learning about monads in a class on Haskell. There were only a handful of students in the class, all sitting on the front row, and a few friends walked by the classroom and noted that we were all scratching our heads at the same time. Literally.