I want to answer this because I think the technique is just too clever. :)
The details differ for every data structure, but let's start by considering a tree. A singly linked list is a "unary" tree: each node has zero or one children, zero if it is the end of the list. Records of records (without cross-references) are trees with a number of children that depends on the type of record.
If you want to replace a big tree T with a bigger tree T' that consists of a new root node whose only child is T, then you can just create the new node and set its "child" pointer to T. You don't have to copy T, you just reference it. If T is actually a linked list, you have just prepended an element to head of the list. The old list works just like it did before (as long as it is singly linked).
If you want to replace a big tree T with a same-size tree T'' that has one leaf replaced, then you have to do some surgery. You start by replacing the leaf and its immediate parent, but when constructing the parent, you can just reference unchanged structures in the old T. Continuing this process recursively up the tree, you find that you have to create new nodes all along the path from the replaced leaf to the root, but none of the others. For a well-balanced tree with N elements, you have to replace about logN nodes. If T is a linked list of length N, you have to replace all N of the nodes.
For linked lists, this is a huge asymmetry: changing the head node costs almost nothing, but changing the last node costs as much as creating a new list. For balanced trees, it is a good trade-off: logN is small compared to N, and you get the safety and portability of (effectively) making defensive copies everywhere, as well as retaining a complete undo history, if you want it. When working with immutable data, you really need to think about data structures, so that you only use the right end of a list, for example.
There are at least two things that could break this. (1) If your data are not *all* strictly immutable, this structural sharing would create a confusing mess of cross-references--- mutations would cause hard-to-predict action-at-a-distance. (2) The data structures must not contain any loops (they must be Directed Acyclic Graphs, or DAGs). Structural sharing would break a doubly linked list, because while the "next" pointer would be correct in the new list, the "prev" pointer would be wrong.
Note that Haskell is not the only language that does this. I'm mostly familiar with Scala's immutable collections, and I know that the same thing has been implemented in Clojure (with a nice "mutable until frozen" feature), and I'm pretty sure that all of these ideas were developed in the ML family of languages in the 70's and 80's.