Monadically Speaking: Benjamin’s Adventures in PLT Wonderland

October 23, 2009

To Scheme, or Not to Scheme: Scheming Schemers and Non-Scheming Schemers, or Keeping the Fun in Scheme

Filed under: Continuations, Multi-paradigm Programming, Programming Language Theory, Scheme — Benjamin L. Russell @ 8:25 pm

Do you use the Scheme programming language? If so, do you program mainly in a serious mood to write applications, or in a crafty mood to have fun? In other words, do you consider yourself a non-Scheming Schemer, or a Scheming Schemer? I consider myself a Scheming Schemer: I program in Scheme mainly in a crafty mood just to have fun. To quote Alan Perlis from the dedication in SICP:

“I think that it’s extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don’t think we are. I think we’re responsible for stretching them, setting them off in new directions, and keeping fun in the house. I hope the field of computer science never loses its sense of fun. Above all, I hope we don’t become missionaries. Don’t feel as if you’re Bible salesmen. The world has too many of those already. What you know about computing other people will learn. Don’t feel as if the key to successful computing is only in your hands. What’s in your hands, I think and hope, is intelligence: the ability to see the machine as more than when you were first led up to it, that you can make it more.”

Alan J. Perlis (April 1, 1922-February 7, 1990)

Nevertheless, as many of you probably know, some recent developments in the evolution of the Scheme programming language have reduced the influence of the language. In particular, the R5RS vs. R6RS schism and the replacement of the 6.001 course at MIT, based on Scheme, with a 6.01 course, based on Python, are two events that have created much controversy.

Concerned over such events, recently, I posted a thread, entitled “Ideas for an SICP’?” on the comp.lang.scheme USENET newsgroup, asking for suggestions for an alternative to SICP with a similar content, as follows:

[W]ith Scheme replaced by Python at MIT, this special role of Scheme
as the vehicle for teaching sophisticated students of computer science
has been greatly diminished. Arguments against SICP as being too
difficult for an introductory textbook notwithstanding, the presence
and usage of that textbook contributed greatly to the significance of
Scheme as a tool in teaching introductory computer science.

It seems that SICP could use a replacement. What is needed is an
alternative textbook to use Scheme in a role that cannot be fulfilled
by such languages as Python, in order to foster creativity and
originality in programming for future freethinking hackers. In
addition, such an alternative textbook would need to be actively used
by leading educational institutions of introductory computer science
in raising a new generation of future Scheme hackers.

Does anybody have any suggestions for a plan that could lead to the
birth and growth of such an alternative leading textbook? Many
programmers tend to be strongly influenced by the first textbook that
they encounter in learning programming; whether that language is
Scheme or Python could have great effect on the future influence of
such languages. The SICP phenomenon has been done once; why not give
rise to a new SICP’ phenonemon?

There were several responses. In particular, one user, Ray, responded as follows:

[N]obody who’s grown up with the web, and who thinks of
computers as being primarily communications devices, will
believe that that makes a language anything other than a
crippled toy if you can’t interface with the hardware
capabilities of the machine, enabling you to do something
as “simple” as writing a web browser in it, managing network
connections, handling Graphical UI elements, and rendering
text and graphics on the screen.

[...]

Now, consider what Scheme’s got that Python doesn’t got.
It comes down to syntactic abstraction and continuations.
Courses based on SICP don’t use them, so MIT had nothing
to lose by going to Python.

Perhaps. But has Scheme really lost the essence of its appeal?

I disagree. Recently, I started reading a guide (albeit in Japanese, since I can read that language as well) by Shiro Kawai on the Gauche implementation of Scheme, and opened up a chapter on continuations. (For those of you who do not know, continuations are a control mechanism in Scheme which allows assignment of control flow to a variable, allowing a process to be “continued” (hence the name) from the point where the continuation was saved.) Since I had not yet fully understood continuations, I found the chapter extremely interesting, and could not stop reading. At one point, I encountered the following procedure which used a continuation, written in continuation-passing style (a.k.a. “CPS” style) (in which, rather than assigning the continuation directly to a variable, the continuation is explicitly passed as a parameter), to calculate the factorial function:

(define (fact/cps n cont)
  (if (= n 0)
      (cont 1)
      (fact/cps (- n 1) (lambda (a) (cont (* n a))))))

This procedure returns the same results as the following (much simpler) one (listed on the same page), which does not use a continuation:

(define (fact n)
  (if (= n 0)
      1
      (* n (fact (- n 1)))))

In particular, I started at the mysterious “(a)” variable in the following piece of code above, wondering what that variable represented:

(fact/cps (- n 1) (lambda (a) (cont (* n a))))

Suddenly, it dawned upon me: The “(a)” variable stored the parameter that was passed to the continuation (“(lambda (a) (cont (* n a)))”), which, in turn, captured the control state of the procedure at that point of execution, which was, in turn, passed back as a parameter to the enclosing procedure! In short, the continuation was a microcosm of the execution context of the procedure at that point in time, encapsulated in a lambda abstraction!

Here, “fact/cps” is the name of the procedure, which stands for “factorial in CPS (continuation-passing style) form.”

Suppose we call “fact/cps” in the most simple case:  a value of 0 for the first parameter, and a function that simply returns the parameter passed to it as a second parameter:

(fact/cps 0 (lambda (a) a))

Then, in “fact/cps”, “(= n 0)” is true in the if-statement, so “(cont 1)” is called, where “cont” is simply the second parameter of the enclosing “fact/cps” function, or “(lambda (a) a)” (the continuation), which returns the value of the parameter “a”, which is 1 in this case, so 1 is returned.

Let’s be a little bolder, and use a value of 1 for the first parameter:

(fact/cps 1 (lambda (a) a))

This time, “(= n 0)” is false in the if-statement, so the following recursive call is made (substituting 1 for n):

(fact/cps (- 1 1) (lambda (a) (cont (* 1 a))))

In the recursive call, (- 1 1) is substituted for the first parameter of “fact/cps”, and “(lambda (a) (cont (* 1 a))” is substituted for the second parameter of “fact/cps”.  This is the same as the following call (reducing the first parameter to a value):

(fact/cps 0 (lambda (a) (cont (* 1 a))))

However, since we had passed an identity function, “(lambda (a) a)” for “cont” in the enclosing function, this reduces to the following call (expanding “cont” to “(lambda (a) a)”):

(fact/cps 0 (lambda (a) ((lambda (a) a) (* 1 a))))

Here, the explicit continuation “(lambda (a) ((lambda (a) a) (* 1 a))))” first takes whatever is handed to it as the parameter “a”, and passes it to the inner function (lambda abstraction, actually, but we’ll dismiss that point here), so we reach “((lambda (a) a) (* 1 a))”.  But this is just the identity function on the inner “(* 1 a)”.  So this part just reduces to “‘(* 1 a)”, which is the same as just “a”, which is whatever value is passed to this continuation, so this continuation is just the identity function.

So when “fact/cps” is recursively called with 0 for the first parameter n and this identity value continuation “(lambda (a) ((lambda (a) a) (* 1 a))))” for the second parameter cont, we first reach “(= n 0)” as the condition in the if-statement, which is true.  This leads to evaluating the following:

(cont 1)

But “(cont 1)” is just this identity continuation called with 1, so it just returns the parameter 1.  So, “(fact/cps 1 (lambda (a) a))” is just the value passed to it:

1

Of course, we didn’t need to pass the identity function “(lambda (a) a)” as the second parameter to “fact/cps”.  We could have formatted the output, for example, by passing a formatting function, instead (to borrow the syntax of Gauche Scheme):

(lambda (a) (format #t "The factorial value is ~a." a))

Then we could have invoked “fact/cps” with that formatting function for the function to be passed as the continuation, as follows:

(fact/cps 3 (lambda (a) (format #t "The factorial value is ~a." a)))

This would have returned the following:

The factorial value is 6.#<undef>

Alternatively, we could have chosen to multiple whatever was returned by 2, just to screw up the function, as follows:

(fact/cps 3 (lambda (a) (* 2 a)))

This would have returned the following:

12

Hey, why not combine the two functions, and get Scheme to say something funny?

(fact/cps 3 (lambda (a) (format #t "I do solemnly swear that the factorial value is ~a." (* 2 a))))

Scheme then would have returned the following:

I do solemnly swear that the factorial value is 12.

Despite (or maybe because of?) all this exploratory monkey-business, in the “Aha!” moment described above, I felt an ecstasy of enlightenment that I do not often experience elsewhere (er, elsewhen, rather).

Such “Aha!” moments are crucial to appreciating the fun in computer science. They are commonly found whenever a deeper level of understanding is achieved by contemplating something which is not obvious at first. I have noticed that they are found more easily in Scheme than in some other programming languages, because of the flexibility that the language allows in even esoteric expressions.

In order to enjoy programming, one must appreciate the fun in programming, and it is difficult to appreciate this factor without experiencing an “Aha!” moment. The deeper the understanding, the more intense the exhilaration associated. Continuations and syntactic abstraction, in particular, are very abstruse (some may even say “arcane”) topics, especially when first encountered, and can require relatively deep understanding. Hence, by providing opportunities to learn such concepts, Scheme can provide an ideal opportunity to experience the fun of programming.  Thus, the continued need for Scheme.

Not all books that use Scheme adopt this approach.  An alternative approach is to structure the curriculum so that the learning becomes a linear process, rather than a series of leaps, so that all the parts fit together neatly like solving a jigsaw puzzle, rather than like climbing a mountain.

Not to be critical of this approach, but not everybody prefers it.  In one critique, José Antonio Ortega Ruiz, in his blog, “A Scheme bookshelf « programming musings,” contrasts one well-known textbook that uses this alternative approach, How to Design Programs (a.k.a. “HtDP”), with SICP, as follows:

The most cited alternative to SICP is How to Design Programs by Felleisen, Findler, Flatt and Krishnamurthi. Its authors have even published a rationale, The Structure and Interpretation of the Computer Science Curriculum, on why they think SICP is not well suited to teaching programming and how their book tries to fix the problems they’ve observed. I won’t try to argue against such eminent schemers, but, frankly, my feeling is that HtDP is a far, far cry from SICP. HtDP almost made me yawn, and there’s no magic to be seen.

Ortega-Ruiz is somewhat harsh in his critique of HtDP.  After all, according to the authors of the paper explaining the rationale behind the book, The Structure and Interpretation of the Computer Science Curriculum, HtDP was created in the first place to rectify two major perceived problems with SICP (page 9 of the paper):

… SICP doesn’t state how to program and how to manage the design
of a program. It leaves these things implicit and implies that students can discover a
discipline of design and programming on their own. The course presents the various
uses and roles of programming ideas with a series of examples. Some exercises then
ask students to modify this code basis, requiring students to read and study code;
others ask them to solve similar problems, which means they have to study the
construction and to change it to the best of their abilities. In short, SICP students
learn by copying and modifying code, which is barely an improvement over typical
programming text books.

SICP’s second major problem concerns its selection of examples and exercises. All
of these use complex domain knowledge….

While these topics are interesting to students who use computing in electrical
engineering and to those who already have significant experience of programming
and computing, they assume too much understanding from students who haven’t
understood programming yet and they assume too much domain knowledge from
any beginning student who needs to acquire program design skills. On the average,
beginners are not interested in mathematics and electrical engineering, and they do
not have ready access to the domain knowledge necessary for solving the domain
problems. As a result, SICP students must spend a considerable effort on the do-
main knowledge and often end up confusing domain knowledge and program design
knowledge. They may even come to the conclusion that programming is a shallow
activity and that what truly matters is an understanding of domain knowledge.

While these are all valid points, Ortega-Ruiz’s last clause, “there’s no magic to be seen,” actually describes the key conflict here.  What exactly is this “magic?”  To be experimentally borderline facetious, according to Arthur C. Clarke,

Any sufficiently advanced technology is indistinguishable from magic.

Arthur C. Clarke, “Profiles of The Future”, 1961 (Clarke’s third law)
English physicist & science fiction author (1917 – )

So maybe we’re actually referring to “any sufficiently advanced technology.”  What do we mean by “sufficiently advanced?”  I would suggest (to use a recursive definition), “sufficiently advanced to the point that deep understanding unachievable superficially is required to understand the material.”

Whether or not this “magic” is to be used in pedagogy actually relates to the fundamental design philosophy behind HtDP, as opposed to that behind SICP.  To quote the above-referenced paper explaining the rationale behind HtDP, “The Structure and Interpretation of the Computer Science Curriculum,” as follows (page 11):

The recipes also introduce a new distinction into program design: structural ver-
sus generative recursion. The structural design recipes in the first half of the book
match the structure of a function to the structure of a data definition. When the
data definition happens to be self-referential, the function is recursive; when there
is a group of definitions with mutual cross-references, there is a group of function
definitions with mutual references among the functions. In contrast, generative re-
cursion concerns the generation of new problem data in the middle of the problem
solving process and the re-use of the problem solving method.

Compare insort and kwik, two standard sort functions:

;; (listof X) -> (listof X)
(define (insort l )
  (cond
    [(empty? l ) empty]
    [else
      (place
        (first l )
        (insort (rest l )))]))

;; (listof X) -> (listof X)
(define (kwik l )
  (cond
    [(empty? l ) empty]
    [else
      (append (kwik (larger (first l ) l ))
                  (first l )
                  (kwik (smaller (first l ) l )))]))

The first function, insort , recurs on a structural portion of the given datum, namely,
(rest l ). The second function, kwik, recurs on data that are generated by some other
functions. To design a structurally recursive function is usually a straightforward
process. To design a generative recursive function, however, almost always requires
some ad hoc insight into the process. Often this insight is derived from some mathe-
matical idea. In addition, while structurally recursive functions naturally terminate
for all inputs, a generative recursive function may diverge. htdp therefore suggests
that students add a discussion about termination to the definition of generative
recursive functions.

HtDP takes pains to remove the requirement for this “ad hoc insight” into the problem-solving process.  The authors of the book then make the following claim (same page):

Distinguishing the two forms of recursion and focusing on the structural case
makes our approach scalable to the object-oriented (OO) world.

That may be so, but that contrasts sharply with the spirit of the original quotation by Alan Perlis above:

Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don’t think we are. I think we’re responsible for stretching them, setting them off in new directions, and keeping fun in the house.

So we have two sharply contrasting approaches:  One to use Scheme for fun, and the other to use Scheme for scalability.  Again, this basically is a matter of taste.

In his above-mentioned post in the above-mentioned thread, Ray contrasted the advantages of Scheme vs. Python as follows:

[S]cheme still has no standard means of managing network connections
or rendering anything on the screen.  Python has these things.

Now, consider what Scheme’s got that Python doesn’t got.
It comes down to syntactic abstraction and continuations.
Courses based on SICP don’t use them, so MIT had nothing
to lose by going to Python.

SICP doesn’t use syntactic abstraction.  In the first
edition, this was because Scheme didn’t have them yet.
In the current edition …. well, here’s the footnote
about macros from the current edition, page 373:

Practical Lisp systems provide a mechanism that
allows users to add new derived expressions and
specify their implementation as syntactic
transformations without modifying the evaluator.
Such a user-defined transformation is called a
/macro/. Although it is easy to add an elementary
mechanism for defining macros, the resulting
language has subtle name-conflict problems. There
has been much research on mechnaisms for macro
definitions that do not cause these  difficulties.
See, for example, Kohlbecker 1986, Clinger and
Rees 1991, and Hanson 1991.

(Aside: “practical” lisp systems have them; the dialect
covered in the book does not.  Students can and do draw
the obvious conclusion….)

Granted, specific implementations do have these functions.  But there is no single main implementation of Scheme that everybody uses, and the libraries that implement these functions are not necessarily portable across implementations.  Therefore, Scheme as a language (as opposed to a specific implementation) does not have these functions.

But is that necessarily bad?  I don’t think so.  I think that the whole point of Scheme, as a language, is, to quote Alan Perlis above, that we are “[not] responsible for the successful, error-free perfect use of these machines, [but] for stretching them, setting them off in new directions, and keeping fun in the house.”

To sum, when I first approached SICP, I found it too challenging to digest.  I had to quit reading it repeatedly, and then return to it later, and I still have read only a portion of the book.  But I became entranced with the magic of computer science as demonstrated by such creatures as tail recursion in Scheme in SICP.  And it was precisely this magic that kept me returning to computer science in general, and to Scheme in particular.

On the other hand, books such as HtDP are very comforting and reassuring.  While SICP sometimes makes me wonder why I am such an idiot, HtDP makes me feel as if I am no longer an idiot.  I no longer need to think for hours and hours during my sleep about how to overcome a particular problem.  Books such as HtDP make the material very straightforward.  However, by doing so, they also remove all the magic, and break the spell.

I feel that an intermediate approach is better.  The magic is necessary, but the sorcery in SICP can be too much at first.  However, the jigsaw-puzzle approach of HtDP seems too straightforward.  There is not enough exploration to maintain interest after a certain level of reader sophistication.  Paradoxically, although I can read HtDP much, much faster than SICP, I also fall asleep reading it just as much faster, and actually haven’t read so far in it.  A gentler approach than that of SICP, which still offers more exploration than HtDP, would be a better compromise.

Also, I feel that the greatest strength of Scheme lies in its flexibility for exploratory programming.  Scheme shares one quality that is also shared by such addictive games as Tetris:  It is relatively simple to learn, yet extremely difficult to master.  Writing simple procedures to calculate such functions as the factorial function or the Fibonacci sequence is deceptively simple at first.  But when the student ventures into such deeper areas as tail recursion, continuations, and syntactic abstractions, the procedures can become tantalizingly complex.

To conclude, shouldn’t Scheme really be a language for scheming programmers to figure out mainly how to have fun?  I like to be a Scheming Schemer, always scheming plots for stretching the lambda abstractions to set them off in new directions, mainly just to have fun.  Are you a Scheming Schemer?

August 26, 2009

Conquering the Fear of Reading Research Papers: Computer Science Research Papers for Non-Computer Scientists

Any non-mathematician, non-computer scientist layperson who has approached programming languages originally spawned in academia, such as Haskell or Scheme, has no doubt been intimidated by the academic rigor and density of many research papers on such subjects. Even such papers labeled “gentle,” as “A Gentle Introduction to Haskell” [1], can turn out to be less “gentle” than expected for those not familiar with the field.

Although I myself do have a background in computer science, as a patent translator, and not a mathematician, computer scientist, or programmer, I tend to approach such papers as more of an amateur programming language theory hobbiest and writer. Having tried to read a number of such papers, I have discovered that although many of them can be difficult to approach, some tend to be more approachable than others.

In particular, most recently, I was rather surprised to encounter a rather lengthy research paper on the history of one such programming language, Simula, which, although detailed, nevertheless turned out to be surprising approachable in not assuming advanced technical knowledge of the field (although it still required great attention to detail): “Compiling SIMULA: A Historical Study of Technological Genesis” [2].

This paper, rather than focusing solely on the technical development of the language, conducts a sociotechnical analysis of the broader historical background surrounding the Simula project. There are no formulas or even code snippets; instead, even though the paper is a research paper published in the IEEE Annals of the History of Computing, it is written as a history paper which just happens to be about the historical background of a programming language. Even more surprising, according to the endnote of the paper, the author, Jan Rune Holmevik, at the time of publication, was a graduate student in history at the University of Trondheim, Norway, and the paper itself “was written as part of [his] dissertation thesis in history Hovudfag [3] at the University of Trondheim” (page 36). In other words, this is a detailed research paper on a programming language, published in an academic journal, which does not assume any ability to program.

Until reading this paper, I had assumed that rigorous research papers on computer science published in academic journals were either written by mathematicians or computer scientists, or at least assumed a background in either mathematics or computer science to read. While this paper is definitely thoroughly researched, documented, and described, and approaches its topic in excruciating detail, it does not assume any background in either mathematics or computer science.

In other words, one need not necessarily be a mathematician or a computer scientist to write, much less read, a research paper on computer science; in fact, there are even some very thorough and detailed research papers on computer science published in academic journals which do not assume any background in either mathematics or computer science, such as this paper.

This discovery came as a revelation.

If it is possible to write a research paper without a background in either mathematics or computer science, then it must definitely also be possible to read at least some such papers. Furthermore, this is most likely not the only such research paper.

In fact, so far, I have encountered a number of computer science research papers which similarly require no or very little background in mathematics or computer science. Here is a list of some other interesting, yet approachable, research papers which (1) are either devoid of, or substantially devoid of, mathematical formulae; (2) are either devoid of, or substantially devoid of, code snippets; and (3) are either devoid of, or substantially devoid of, technical content assuming a background in mathematics or computer science:

i) “The Structure and Interpretation of the Computer Science Curriculum” [4]. By Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, Shriram Krishnamurthi.
Published in the Journal of Functional Programming in 2004, this paper discusses the motivation and design rationale for the book How to Design Programs [5] by comparision and contrast with the book Structure and Interpretation of Computer Programs [6].

ii) “Haskell: Batteries Included” [7]. By Duncan Coutts, Isaac Potoczny-Jones, and Don Stewart. Published in Proceedings of the first ACM SIGPLAN symposium on Haskell (2008).
This paper outlines the motivation for and structure of the Haskell Platform, a “Haskell for the masses” versioned packaging system of Haskell and included libraries. Although this paper does not specifically assume familiarity with mathematics or computer science, it does make use of such technical terminology as “libraries,” “packages,” “source code,” and “package description file.”

iii) “Teaching Programming Languages in a Post-Linnaean Age” [8]. By Shriram Krishnamurthi. Published in First SIGPLAN Workshop on Undergraduate Programming Language Curricula in 2008.
This paper claims that programming languages should be viewed as aggregations of features, rather than languages defined by taxonomies, and asserts that the term “paradigm” is ill-defined and should play no role in classifying programming languages. The book also addresses the issue of the split between textbooks that are “rich in mathematical rigor but low on accessibility, and those high on accessibility but lacking rigor (and, often, even wrong),” and offers an alternative.

iv) “The Early History of Smalltalk” [9]. By Alan C. Kay. Published in History of Programming Languages: The second ACM SIGPLAN conference on History of programming languages in 1993.
This paper describes the historical background behind the evolution of Smalltalk, a pure object-oriented language. It also, in part, describes the visit of Steve Jobs, Jeff Raskin, and others of then Apple, Inc. to the Xerox PARC laboratory, which led to the subsequent development of the Macintosh user interface, based on the Smalltalk user interface.

v) “Design Principles Behind Smalltalk” [10]. By Daniel H. H. Ingalls. Published in BYTE Magazine, August 1981.
This paper is a non-technical exposition of the design principles behind the Smalltalk-80 programming system. Illustrated with descriptive figures, the paper focuses not just on the programming language issues behind Smalltalk as a language of description, but also the user interface issues behind Smalltalk as a language of interaction. In particular, the paper describes how the research, in two- to four-year cycles, has paralleled the scientific method in repeatedly making an observation, formulating a theory, and making a prediction that can be tested, and summarizes key concepts as one- to two-sentence nutshell statements.

Lastly (but not leastly), the following paper, although containing a significant number of code snippets and assuming some background in computer science, is sufficiently interesting to be worthy of special mention; the first few sections of it can be safely read by a reader unfamiliar with the subject matter:

vi) “A History of Haskell: Being Lazy With Class” [11]. By Paul Hudak, John Hughes, Simon Peyton Jones, and Philip Wadler. Published in The Third ACM SIGPLAN History of Programming Languages Conference (HOPL-III) in 2007.
This paper provides a very interesting description of the motivation and historical background of the functional programming language Haskell. In particular, the paper describes the influence of a precursor to Haskell, Miranda (pages 3 to 4); mentions how Gerry Sussman and Guy L. Steele briefly considered the idea of introducing lazy evaluation in Scheme (page 3); provides a timeline of the development of Haskell (page 7), and then proceeds on to describe the syntax and semantics (pages 11 to 28), implementations and tools (pages 28 to 35), and applications and impact (pages 35 to 46). I sometimes return back to this paper when I feel frustrated with the dryness of many other papers on Haskell, since this is one of the few papers on the language which conveys a sense of the excitement surrounding the birth and early development of the language; many other papers on Haskell tend to focus solely on technical issues, without discussing the role of human beings in the context.

As pointed out in Holmevik [2], programming languages do not “evolve in a technical or scientific vacuum” (page 35). This point is often ignored in many other papers about Haskell; luckily, it is dealt with in depth in this paper.

In my experience, becoming accustomed to reading research papers is a gradual, rather than instantaneous, process: After reading a number of such papers, one tends to become used to reading them; to recognize that failure to understand the content is not necessarily the fault of the reader, but often that of a wolly exposition that either fails to describe prerequisites to the content, or does not describe them adequately; and that the best research papers are not necessarily those that describe the most difficult content, but those that offer the most readily understandable exposition of the material to the intended target audience.

One must understand that many research papers are not necessarily written to be easy to read, but to fulfill a specific need, such as a part of a requirement for a degree, and are hence qualitatively different from most textbooks, which tend to be written so as to be easy to understand for a broader audience. Hence, it is actually quite normal for a research paper of mediocre quality to be written in such a way as to expect the reader to fill in the prequisite content, which may be assumed but not stated. (Of course, the best research papers tend to fill in any prerequisite content.)

Often, the best researchers are not the best writers; many of them cannot understand why a topic which is of trivial difficulty to them can possibly be of non-trivial difficulty for another reader. A reader aware of this fact can often approach research papers with a better plan for mastering the content therein.

Lastly, if I might add a personal expectation, I tend to enjoy reading papers that acknowledge that a programming language is an artifact resulting from a complex interplay of many different human desires, needs, and expectations surrounding its birth, and does not develop in a social vacuum. Computers do not design programming languages; people do. Therefore, discussing a programming language as if it were merely a logical extension of prior developments in syntax and semantics ignores a significant factor in its evolution. I have found that the best research papers tend to be those that, while providing a rigorous treatment of the subject material, do not assume any prerequite material not normally possessed by the intended target audience; acknowledge that some readers may not be as intelligent as the author and may find certain points that seem trivial to the author to be non-trivial; provide sufficient elucidation to cope accordingly; and discuss the human issues surrounding the design and evolution of the language.

[1] Hudak, Paul. “A Gentle Introduction to Haskell, Version 98.” New York, NY: ACM SIGPLAN Notices 27:5 (1992): 1-52. <http://portal.acm.org/ft_gateway.cfm?id=130698&type=pdf&coll=GUIDE&dl=GUIDE&CFID=50053868&CFTOKEN=12610081>. An updated, free 1998 version is also available at <http://www.haskell.org/tutorial/>.

[2] Holmevik, Jan Rune. “Compiling SIMULA: A Historical Study of Technological Genesis.” Washington, D.C.: Annals of the History of Computing 16:4 (1994): 25-36. <http://www.idi.ntnu.no/grupper/su/publ/simula/holmevik-simula-ieeeannals94.pdf>.

[3] Regarding the term “Hovudfag,” Holmevik writes (page 36, footnote), “Hovudfag may be regarded as the Norwegian equivalent to a master’s degree, although it carries considerably more workload and normally takes two to three years to complete.”

[4] Felleisen, Matthias, Robert Bruce Findler, Matthew Flatt, and Shriram Krishnamurthi. “The Structure and Interpretation of the Computer Science Curriculum.” Cambridge: Journal of Functional Programming 14:4 (2004): 365-378. <http://www.cs.brown.edu/~sk/Publications/Papers/Published/fffk-htdp-vs-sicp-journal/>.

[5] Felleisen, Matthias, Robert Bruce Findler, Matthew Flatt, and Shriram Krishnamurthi. How to Design Programs. Cambridge, MA: The MIT Press, 2003. <http://www.htdp.org/>.

[6] Abelson, Harold and Gerald Jay Sussman with Julie Sussman. Structure and Interpretation of Computer Programs, Second Edition. Cambridge, MA: The MIT Press and New York: McGraw-Hill, 1996. <http://mitpress.mit.edu/sicp/full-text/book/book.html>.

[7] Coutts, Duncan, Isaac Potoczny-Jones, and Don Stewart. “Haskell: Batteries Included.” Victoria, BC, Canada: Proceedings of the first ACM SIGPLAN symposium on Haskell (2008): 125-126.<http://www.cse.unsw.edu.au/~dons/papers/haskell31-coutts.pdf>.

[8] Krishnamurthi, Shriram. “Teaching Programming Languages in a Post-Linnaean Age.” Cambridge, MA: First SIGPLAN Workshop on Undergraduate Programming Language Curricula (2008): 81-83. <http://www.cs.brown.edu/~sk/Publications/Papers/Published/sk-teach-pl-post-linnaean/>.

[9] Kay, Alan C. “The Early History of Smalltalk.” Cambridge, Massachusetts: History of Programming Languages: The second ACM SIGPLAN conference on History of programming languages (1993): 69-95. <http://portal.acm.org/citation.cfm?id=154766.155364&coll=GUIDE&dl=GUIDE&CFID=45415434&CFTOKEN=84716013>. Also available at <http://gagne.homedns.org/~tgagne/contrib/EarlyHistoryST.html>.

[10] Ingalls, Daniel H. H. “Design Principles Behind Smalltalk.” BYTE Magazine, August 1981. <http://www.fit.vutbr.cz/study/courses/OMP/public/software/sqcdrom2/Documents/DesignPrinciples/DesignPrinciples.html>.

[11] Hudak, Paul, John Hughes, Simon Peyton Jones, and Philip Wadler. “A History of Haskell: Being Lazy With Class.” San Diego, CA: The Third ACM SIGPLAN History of Programming Languages Conference (HOPL-III) (2007): 12-1 – 12-55, 2007. <http://research.microsoft.com/en-us/um/people/simonpj/papers/history-of-haskell/history.pdf>.

August 25, 2009

Paradigm Shift: Back to the Past, and No Small Talk About Smalltalk

Filed under: Object-oriented Programming, Programming Language Theory, Smalltalk — Benjamin L. Russell @ 7:01 pm

Those who have been reading my posts may have noticed this trend, but there has been a decided shift in the nature of my posts starting on June 18, 2009.

Specifically, prior to this date, the majority of my posts focused on Haskell, Scheme, and category theory, with a focus on purely functional programming. While I am still interested in purely functional programming, one of my other major interests is in creating a three-dimensional virtual world with some innovative functionality (I have something specific in mind).

At first, I was intent on finding a way to create such a world using a purely functional programming language. However, most purely functional programming languages do not have enough libraries to enable easy creation of such a world. Furthermore, in order to create such a world, most purely functional programming languages would require a rather sophisticated knowledge of linear algebra, which is one area of mathematics that my visiting discrete mathematics professor in college did not adequately cover, and which I have never had enough time to study fully on my own; by contrast, my favorite areas of mathematics are all related to set theory, recursive function theory (a.k.a. “computability theory”), and philosophical logic.

Therefore, I began searching for a programming language which would enable feasible writing of a three-dimensional virtual world without requiring explicit knowledge of linear algebra. Furthermore, since I was interested in programming language theory, I wanted a programming language that was at least based on a general-purpose programming language, as opposed to a domain-specific language.

After an intermittent search that lasted several months, I eventually came across a tool called “Cobalt,” based on Croquet, further based on Squeak, a dialect of Smalltalk. Unfortunately, Smalltalk is a pure object-oriented language, not a functional programming language, and after repeated attempts at approaching Squeak, the basis for Cobalt, I found the GUI-based interface rather difficult to get used to, having come from an Emacs-based textual environment. In addition, having come from a functional programming background, I found the concept of object-oriented programming highly counter-intuitive. (Apparently, I’m not the only person who has experienced this problem; similar arguments have been advanced by Paul Hudak [1] and Jonathan A. Rees [2].)

In short, I was encountering a paradigm-shift problem (with apologies to Shriram Krishnamurthi, who claims [3] that paradigms are ill-defined and hence significantly meaningless).

Here, I was faced with a dilemma: If I tried using a functional programming language, I would probably need to do a lot of work in writing the necessary libraries for the three-dimensional manipulation of graphical objects, which would additionally require learning linear algebra, which, between my full-time translation job and my busy weekends, I simply did not have enough time to learn. On the other hand, if I tried using Squeak, then every time I tried to learn the language, I would feel uncomfortable with the object-oriented paradigm, and with the GUI-based environment, and keep returning to such familiar programming languages as Scheme and Haskell, and to such development environments as Emacs.

After some thought, I realized that the problem with learning Squeak did not have to do with any inherent difficulty in Squeak itself; rather, I needed, at least temporarily, to unlearn functional programming and unlearn working in Emacs. In short, I needed to restore a blank mental slate. Well, where did I first learn functional programming and Emacs? Ah, that’s right: in college.

Although I couldn’t actually un-attend college, I could, in a sense, restore my mental state to just before attending college: What I needed to do was to go back to the past, mentally speaking, to just before college, and approach Squeak with a fresh mind. To borrow Scheme terminology, I needed to resume the continuation in the process of my life from just before attending college: Then it would be straightforward.

One night, just after midnight on Thursday, June 18, 2009, I was walking back home, reminiscing: Let’s see … what was I doing back then. Going back in time …

2009,
2008,
2007 (changed jobs again, and became patent translator),
2006 (became patent checker, then changed jobs, and became software project leader),
2005,
2004 (moved from Manhattan to Tokyo),
2003,
2002 (political difficulties at work; job further downgraded to English teacher),
2001 (WTC disaster in Manhattan, where I lived; severe downsizing at workplace resulted; job downgraded to Materials Coordinator),
2000,
1999 (became Localization Coordinator at Berlitz),
1998,
1997 (first moved to Manhattan from New Rochelle, NY),
1996 (began first major job as Systems Engineer in White Plains; moved to New Rochelle from Jersey City),
1995 (first moved to Jersey City from New Haven),
1994 (graduated from college),
1993 (took courses in recursive function theory, philosophical logic, and the lambda calculus, especially enjoying the lambda calculus; first exposure to Haskell in auditing a course on Haskell),
1992 (finished leave of absence and self-study of discrete mathematics; took a course in axiomatic set theory),
1991 (began leave of absence and self-study of discrete mathematics),
1990 (took a course on Pascal and hated it; embarked on Computer Science major; started learning Common Lisp and Scheme: hated Common Lisp because of all the funcalls and idiosyncracies, but enjoyed Scheme because of the relative simplicity and regularity of structure of the language; started learning Emacs; learned how much I did not know, and how stupid I was, and became chronically depressed),
1989 (moved from Tokyo to New Haven, and matriculated at college).

1989. Ah, there: Continue from the continuation of my life-process at that point: the early afternoon of August 31, 1989, just before leaving for Narita Airport to go to New York to take the bus therefrom to New Haven to begin my (dreaded) college studies.

No Emacs. No Scheme. No Haskell. No category theory. No chronic depression. Return of math phobia. Return of Japanese popular music. Return of a simple mind which is not depressed because it does not know how much it does not know. Return of interest in multimedia. Aha!

Multimedia: the missing link! At that time, I was very interested in the Fujitsu FM Towns, a Japanese personal computer modeled on the Macintosh, the interface of which was based on Smalltalk [4]! Proceeding to Smalltalk from this continuation would be relatively trivial!

Sometimes, one needs to move backward in order to move forward.

So I decided to resume my continuation from this point on, with a fresh mind.

Resuming continuation….

I awoke, as if from a trance.

The next day, I returned to my computer, continuing the continuation. Suddenly, this strange text-based interface on my screen called “Emacs” seemed like a monstrosity that some text-based hacker must have concocted just for the sheer challenge of mastering arcane keystroke-combinations. Yuck! There must be a way to do programming without having to master arcane keystroke-combinations.

Let’s see; where can I find a point-click-drag interface that allows me to program without having to use a textual editor … preferably, one similar to the graphical user interface of the Fujitsu FM Towns, based on the user interface of the Macintosh….

Aha! what’s this mouse-face-icon on my desktop labelled “Squeak?” Double-clicking on the icon labelled “Squeak”….

Hmm … a colorful background with illustrations. Sound. Multimedia. Point. Click. Drag. How intuitive: just like the Macintosh interface! Hmm … some research shows that it is an implementation of a language called “Smalltalk,” the interface of which was the basis for the Macintosh … what a coincidence … how curious…. I wonder who put it here….

Hmm … found a note here. It says, “Note to myself: Learn Squeak and Cobalt, and build a virtual world using Cobalt, using ideas described on the attached sheet.” Sure; why not?

[1] Hudak, Paul. “[Haskell-cafe] a regressive view of support for imperative programming in Haskell.” Online posting. 8 Aug. 2007. 25 Aug. 2009. <news://gmane.comp.lang.haskell.cafe>. Also available at <http://www.haskell.org/pipermail/haskell-cafe/2007-August/030178.html>.

[2] Rees, Jonathan A. “JAR on Object-Oriented.” Online posting. 11 May 2003. 25 Aug. 2009. <http://mumble.net/~jar/articles/oo.html>.

[3] Krishnamurthi, Shriram. “Teaching Programming Languages in a Post-Linnaean Age”. Cambridge:_2008 SIGPLAN Workshop on Programming Language Curriculum_ (2008): 81-83. <http://www.cs.brown.edu/~sk/Publications/Papers/Published/sk-teach-pl-post-linnaean/paper.pdf>.

[4] Kay, Alan C. “The Early History of Smalltalk.” Cambridge, Massachusetts: _History of Programming Languages: The second ACM SIGPLAN conference on History of programming languages_ (1993): 69-95. <http://portal.acm.org/citation.cfm?id=154766.155364&coll=GUIDE&dl=GUIDE&CFID=45415434&CFTOKEN=84716013>. Also available at <http://gagne.homedns.org/~tgagne/contrib/EarlyHistoryST.html#29>.

August 24, 2009

Thinking in Scheme and Checking a Patent Claim: A Cross-disciplinary Application of Scheme-based Reasoning

Filed under: Programming Language Theory, Scheme — Benjamin L. Russell @ 4:27 pm

(This content of this post is based on the content of my post [1] entitled “Re: [semi-OT] possible benefits from training in Scheme programming in patent translation” on the USENET newsgroup comp.lang.scheme.)

As a follow-up to my previous post, “How Scheme Can Train the Mind: One Reason that MIT Should Reinstate Scheme and 6.001,” here is an example of an application of the Scheme-style pattern of thinking applied to checking a pattern claim.

Most readers of this blog do not read Japanese; therefore, this example assumes that another translator has already translated a patent claim from Japanese to English, and that I need to check the translation for accuracy. In order to do so, I need to break down the claim into its semantic components and determine whether the semantics of the translation and the original are equivalent. Here is a contrived example, assuming that somebody has developed an in vitro, as opposed to in vivo, remote control device and receptor that can be optionally mounted on the ear (similarly to the ear-mounted transmitter/receivers worn by Agents in the motion picture saga The Matrix) and used as a controller for a video gaming device; please note that since I am not actually writing a Scheme program, but only using thought patterns derived from writing S-expressions in Scheme programs, this is merely pseudo-Scheme, not actual Scheme:

What is claimed is:

1. An in vitro remote-control device, the device comprising:
        a mounting unit for mounting the remote-control device on a portion of the head of a user; 
        a transmitter unit, further comprising:
             a neurotransmitter unit transmitting neural impulses to a brain of the user; 
             a neuro-digital conversion unit converting neural impulses to digital signals; 
	a receiver unit, further comprising:
             a neuroreceptor unit receiving neural impulses from the brain;
             a digital-neuro conversion unit converting digital signals to neural impulses; and 
        a power source unit converting thermal radiation from brain cells into electricity, the electricity powering the remote-control device;
   wherein:
        the transmitter unit transmits digital signals in response to neural impulses; and
        the receiver unit transmits neural impulses in response to digital signals.

The above-mentioned claim translates roughly into my personal variety of pseudo-Scheme as follows (apologies for any deviations from the syntax or semantics of actual Scheme):

(claim-define (_in-vitro_-remote-control-device mounting-unit transmitter-unit receiver-unit power-source-unit)
              (claim-comprising
               (mounting-unit
                (lambda (head user)
                  (mount head user)))
               (transmitter-unit
                (lambda (neurotransmitter-unit neuro-digital-conversion-unit)
                  (claim-define (neutrotransmitter-unit neural-impulses brain)
                                (transmit neural-impulses brain user))
                  (claim-define (neuro-digital-conversion-unit neural-impulses digital-signals)
                                (convert (neural-impulses digital-signals))))
                (receiver-unit
                 (lambda (neuroreceptor-unit digital-neuro-conversion-unit)
                   (claim-define (neuroreceptor-unit neural-impulses brain)
                                 (receive neural-impulses brain))
                   (claim-define (digital-neuro-conversion-unit)
                                 (convert digital-signals neural-impulses))))
                (power-source-unit
                 (lambda (thermal-radiation brain-cells electricity)
                   (power-convert thermal-radiation electricity brain-cells)))))
                (claim-wherein
                 (transmitter-unit
                  (lambda (digital-signals neural-impulses)
                    (transmit digital-signals neural-impulses)))
                 (receiver-unit
                  (lambda (neural-impulses digital-signals)
                    (transmit neural-impulses digital-signals)))))

This pattern of thinking can greatly simplify verifying equivalence between an English translation of a claim having a complex structure and the original Japanese claim.

[1] Russell, Benjamin L. “Re: [semi-OT] possible benefits from training in Scheme programming in patent translation.” Online posting. 20 Aug. 2009. 24 Aug. 2009. <news://comp.lang.scheme>. Also available at <http://groups.google.co.jp/group/comp.lang.scheme/msg/871965c4090e127f>.

August 19, 2009

How Scheme Can Train the Mind: One Reason that MIT Should Reinstate Scheme and 6.001

Filed under: Programming Language Theory, Scheme — Benjamin L. Russell @ 9:32 pm

(This content of this post is substantially identical to the content of my post [1] entitled “[semi-OT] possible benefits from training in Scheme programming in patent translation” on the USENET newsgroup comp.lang.scheme.)

Today, I came across an interesting phenomenon in which exposure to Scheme programming helped with technical translation of part of a patent specification.

Since the material is classified, I can only reveal the structure, and not the content, but basically, there was a document containing a “claim” (a sentence in a specification which specifies what is being claimed in the patent being applied for) which somebody had slightly mis-translated from English to Japanese, and which was being amended.

The original English clause in the claim had the following structure:

“… an A in communication with a plurality of B, said A configured to generate a C signal, configured to cause at least one of said plurality of B to output a said D, said C signal based at least in part on said E signal.”

Unfortunately, whoever translated that clause from English to Japanese apparently left out the “said A configured to generate a C signal” portion.

Then this mis-translated Japanese translation of the original English clause was amended, but was never translated back to English.

Then this amended Japanese clause was re-amended, and I was asked to “apply” the re-amendment to the English original. The re-amended Japanese clause then had the following structure (after I finally figured out the structure):

“… an A in communication with a plurality of B, said A configured to generate a C signal, configured to cause the C coupled to said F so that a positional relationship, for the E which has sent the E signal, corresponding similarly to a positional relationship between the E which has sent the E signal and the G of said plurality of B to output a said force associated with the strength detected by the E, said B signal based on said E signal.”

The first aspect that I noticed was that the previous amendment had never been translated, requiring me to fill in the details.

However, then I noticed that this previous amendment had itself been based on a mis-translated original.

In order to figure out which portion was missing from the translation of the original clause, I needed to map portions of the original English clause to their Japanese equivalents, but since the structure itself was not written to reflect the structure of the original English clause, I then needed to break up the original English clause into its structural components.

At first, this process seemed very tedious and difficult, until I noticed that treating these structural components in the clause as if they were S-expressions in a Scheme program, and then mapping equivalent components of the English clause to semi-corresponding components of the Japanese (mis-)translation speeded up and simplified this process greatly, even though the correspondence was not exact.

For some reason, I have discovered that this kind of mental equivalence seems to proceed much more smoothly between S-expressions in Scheme programs and claims in patent documents than between other kinds of expressions in other functional programming languages and the same claims in patent documents. For example, I have not had similar experiences with finding equivalences between expressions in even Haskell programs and the claims in patent documents; Haskell expressions seem to be more equivalent to mathematical equations than to claims in patent documents.

Therefore, it seems that exposure to the Scheme programming language, in particular, can help in training non-programmers to think structurally in analyzing expressions in natural language, which can have benefits in translating claims in patent documents in such a manner that they can be more easily and clearly amended.

Perhaps MIT should reinstate Scheme and 6.001, and get rid of Python and the new C1. Somehow I feel that MIT is risking creating a new generation of idiots by getting rid of Scheme and SICP from their curriculum just for the ostensible reason that the recursive style of programming does not reflect the way that programming is actually conducted in industry. Students do not learn programming just to program; learning programming also has important ramifications for the structural thought processes underlying other technical fields, even those that do not seem superficially related (such as patent translation), and it seems that watering down a core programming course for such ostensible reasons undermines the crucial patterns of thinking which are cross-applicable to such other technical fields as well.

[1] Russell, Benjamin L. “[semi-OT] possible benefits from training in Scheme programming in patent translation.” Online posting. 19 Aug. 2009. 19 Aug. 2009. <news://comp.lang.scheme>. Also available at <http://groups.google.com/group/comp.lang.scheme/browse_thread/thread/4590474ce458597c#>.

April 13, 2009

Climbing the Towers of Hanoi with Haskell-style Curry from a Monadic Container (While Sparing the Sugar!)

Filed under: Haskell, Monads, Programming Language Theory — Benjamin L. Russell @ 7:59 pm

Have you ever felt like eating some Haskellian curry on your way to the top of the Towers of Hanoi, but could do without the sugar in the container? Well, here’s how….

Previously, I had posted the following solution to the Towers of Hano problem:

hanoi :: a -> a -> a -> Int -> [(a, a)]
hanoi source using dest n
    | n == 0 = []
    | n == 1 = [(source, dest)]
    | otherwise = hanoi source dest using (n-1)
                  ++ hanoi source using dest 1
                         ++ hanoi using source dest (n-1)

hanoi_shower :: Show a => [(a, a)] -> String
hanoi_shower [] = ""
hanoi_shower ((a, b):moves) = unlines ["Move " ++ show a ++ " to "++ show b ++ "."] ++ hanoi_shower moves

However, this function seemed somewhat difficult to understand at a glance, and had the problem that it needed to be called as follows:

putStr (hanoi_shower (hanoi 'a' 'b' 'c' 3))

Thereupon, the function would reward the user with the following results:

Move 'a' to 'c'.
Move 'a' to 'b'.
Move 'c' to 'b'.
Move 'a' to 'c'.
Move 'b' to 'a'.
Move 'b' to 'c'.
Move 'a' to 'c'.

While technically a solution, this function had never seemed an entirely satisfactory counterpart to my earlier Scheme solution (part of the text in the output statement thereof below has been slightly reworded from “… from disc” to “… disc from” here for readability) to the same problem:

(define (hanoi n)
  (hanoi-helper 'A 'B 'C n))

(define (hanoi-helper source using destination n)
  (cond ((= n 1)
         (printf "Moving disc from ~a to ~a.\n" source destination))
        (else
         (hanoi-helper source destination using (- n 1))
         (hanoi-helper source using destination 1)
         (hanoi-helper using source destination (- n 1)))))

In particular, the Scheme version could be invoked quite simply as follows:

(hanoi 3)

Thereupon, it would reward the user with the following results:

Moving disc from A to C.
Moving disc from A to B.
Moving disc from C to B.
Moving disc from A to C.
Moving disc from B to A.
Moving disc from B to C.
Moving disc from A to C.

Why not the Haskell version?

At first, not understanding how to use monads correctly, I tried to write the following (incorrect) version, thinking that I could just concatenate the results of the recursive calls into one long string:

hanoi :: Integer -> IO ()
hanoi n = hanoiHelper n "A" "B" "C"

hanoiHelper :: Integer -> String -> String -> String -> IO ()
hanoiHelper n source dest using
                | n == 1 = putStrLn ("Move disc from " ++ source ++ " to " ++ dest ++ ".")
                | otherwise = hanoiHelper (n-1) source using dest
                              ++ hanoiHelper 1 source dest using
                              ++ hanoiHelper (n-1) using dest source

However, attempting to load this function into WinGhci resulted in the following error message:

    Couldn't match expected type `[a]' against inferred type `IO ()'
    In the first argument of `(++)', namely
        `hanoiHelper (n - 1) source using dest'
    In the expression:
          hanoiHelper (n - 1) source using dest
        ++  hanoiHelper 1 source dest using
          ++
            hanoiHelper (n - 1) using dest source
    In the definition of `hanoiHelper':
        hanoiHelper n source dest using
                      | n == 1
                      = putStrLn ("Move disc from " ++ source ++ " to " ++ dest ++ ".")
                      | otherwise
                      = hanoiHelper (n - 1) source using dest
                      ++  hanoiHelper 1 source dest using
                        ++
                          hanoiHelper (n - 1) using dest source

Since the error message reported that the expected type of hanoiHelper was `[a]‘, I then tried modifying the type signature for hanoiHelper accordingly, as follows:

hanoiHelper :: Integer -> String -> String -> String -> [a]

Unfortunately, this only reported in the converse error, thus:

[1 of 1] Compiling Main             ( hanoiIncorrect.hs, interpreted )

hanoiIncorrect.hs:9:10:
    Couldn't match expected type `IO ()' against inferred type `[a]'
    In the expression: hanoiHelper n "A" "B" "C"
    In the definition of `hanoi': hanoi n = hanoiHelper n "A" "B" "C"

hanoiIncorrect.hs:13:27:
    Couldn't match expected type `[a]' against inferred type `IO ()'
    In the expression:
        putStrLn ("Move disc from " ++ source ++ " to " ++ dest ++ ".")
    In the definition of `hanoiHelper':
        hanoiHelper n source dest using
                      | n == 1
                      = putStrLn ("Move disc from " ++ source ++ " to " ++ dest ++ ".")
                      | otherwise
                      = hanoiHelper (n - 1) source using dest
                      ++  hanoiHelper 1 source dest using
                        ++
                          hanoiHelper (n - 1) using dest source
Failed, modules loaded: none.

Stumped, I came across the following example of code for solving this problem:

 hanoiM :: Integer -> IO ()
 hanoiM n = hanoiM' n 1 2 3 where
   hanoiM' 0 a b c = return ()
   hanoiM' n a b c = do
     hanoiM' (n-1) a c b
     putStrLn $ "Move " ++ show a ++ " to " ++ show b
     hanoiM' (n-1) c b a

However, I wasn’t satisfied with this solution, because the syntactic sugar of the do-notation obscured the semantics of what was happening. I liked to have my Haskell curry without sugar.

Then I came across the following explanation of an unsugared monad:

Let’s examine how to desugar a ‘do’ with multiple statements in the following example:

main = do putStr "What is your name?"
          putStr "How old are you?"
          putStr "Nice day!"

The ‘do’ statement here just joins several IO actions that should be performed sequentially. It’s translated to sequential applications of one of the so-called “binding operators”, namely ‘>>’:

main = (putStr "What is your name?")
       >> ( (putStr "How old are you?")
            >> (putStr "Nice day!")
          )

After comparing the two examples and looking at the nesting of parentheses for a while, I suddenly realized that monads must be some kind of container. Apparently, the monads served to contain something inside that was undesirable in a purely functional environment outside, and the bind (“>>”) notation could be nested to enforce sequencing on the execution order.

After some thought, I came up with the corresponding Haskell solution; viz.:

hanoi :: Integer -> IO ()
hanoi n = hanoiHelper n "A" "B" "C" where
    hanoiHelper 1 source dest using = putStrLn ("Move disc from " ++ source ++ " to " ++ dest ++ ".")
    hanoiHelper n source dest using = (hanoiHelper (n-1) source using dest)
                                      >> ( (hanoiHelper 1 source dest using)
                                           >> (hanoiHelper (n-1) using dest source)
                                         )

This function type-checked correctly, and could be summoned as follows:

hanoi 3

Thereupon, similarly to its Scheme counterpart, it would now reward the user as follows:

Move disc from A to B.
Move disc from A to C.
Move disc from B to C.
Move disc from A to B.
Move disc from C to A.
Move disc from C to B.
Move disc from A to B.

For those who prefer do-notation, here is the same function sweetened with syntactic sugar:

hanoi :: Integer -> IO ()
hanoi n = hanoiHelper n "A" "B" "C" where
    hanoiHelper 1 source dest using = putStrLn ("Move disc from " ++ source ++ " to " ++ dest ++ ".")
    hanoiHelper n source dest using = do
                                      hanoiHelper (n-1) source using dest
                                      hanoiHelper 1 source dest using
                                      hanoiHelper (n-1) using dest source

Although almost a direct counterpart to the corresponding Scheme version, it was still not structured similarly as a main function and a helper function; I wanted to divide it correspondingly.

Eventually, I came up with the following corresponding solution, which also used a helper function:

hanoi :: Integer -> IO ()
hanoi n = hanoiHelper n "A" "B" "C"

hanoiHelper :: Integer -> String -> String -> String -> IO ()
hanoiHelper n source dest using
                | n == 1 = putStrLn ("Move disc from " ++ source ++ " to " ++ dest ++ ".")
                | otherwise = (hanoiHelper (n-1) source using dest)
                                      >> ( (hanoiHelper 1 source dest using)
                                           >> (hanoiHelper (n-1) using dest source)
                                         )

As a bonus, during my research, I chanced upon the following explanation of the ‘$’ operator:

$ operator

This is hopefully something worth sharing about Haskell. The $ operator.


 simpleHTTP $ buildRequest req_text

 simpleHTTP ( buildRequest req_text )

It is an application operator, it takes a function and an argument, and … applies the function to the argument. It’s purpose is to save typing parentheses. It is all about operator precedence.


 head . words $ config_file_contents

 ( head . words ) config_file_contents

Application, f a (f applies to a), binds stronger than any operator. If it was an operator, think about multiplication operator which people often omit, it would have precedence set to 10. $ has precedence set to 0, which is the lowest value of precedence possible.

The . is my another favourite. (f . g) a == f (g a) it set to 9, and therefore binds almost as strong as application.


 listActions :: String -> [Action]
 listActions = filter notDone . map actionFromMap . parseJSON

Feeling richer with the ‘$’, I rewrote my function, listed below complete with comments and a copyleft disclaimer, to suit my mood accordingly:

-- hanoiHelpedRich.hs
-- Haskell function to compute the Towers of Hanoi problem recursively, 
-- using a helper function and the '$' operator to take a function and 
-- an argument, and accordingly save typing parentheses
--
-- Copyright(C) April 13, 2009, at 19:56, 
-- by Benjamin L. Russell
-- 
-- This program is free software; you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation; either version 2 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program; if not, write to the Free Software
-- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

hanoi :: Integer -> IO ()
hanoi n = hanoiHelper n "A" "B" "C"

hanoiHelper :: Integer -> String -> String -> String -> IO ()
hanoiHelper n source dest using
                | n == 1 = putStrLn $ "Move disc from " ++ source ++ " to " ++ dest ++ "."
                | otherwise = (hanoiHelper (n-1) source using dest)
                                      >> ( (hanoiHelper 1 source dest using)
                                           >> (hanoiHelper (n-1) using dest source)
                                         )

Now I am on a diet with fewer parentheses, but with more ‘$’ in my pocket.

P.S.

One commentator, augustss, has been kind enough to point out that my version of hanoi can be rewritten to be less IO-based, and more functional. Per augustss’s suggestion, I have rewritten hanoi.hs as follows:

-- hanoiMover.hs
-- Originally based on hanoi_v1.1.hs, a Haskell function to compute
-- the Towers of Hanoi problem recursively, by Benjamin L. Russell,
-- dated April 16, 2008, at 14:17; 
-- revised based on a comment from augustss, dated April 18, 2009, at
-- 17:38, (see
-- http://dekudekuplex.wordpress.ecom/2009/04/13/climbing-the-towers-of-hanoi-with-haskell-style-curry-from-a-monadic-container-while-sparing-the-sugar/#comment-58),
-- in response to my blog post "Climbing the Towers of Hanoi with
-- Haskell-style Curry from a Monadic Container (While Sparing the 
-- Sugar!)," dated April 13, 2009, at 19:59 (see
-- http://dekudekuplex.wordpress.com/2009/04/13/climbing-the-towers-of-hanoi-with-haskell-style-curry-from-a-monadic-container-while-sparing-the-sugar/)
--
-- Copyright(C) April 20, 2009, at 18:38, 
-- by Benjamin L. Russell
-- 
-- This program is free software; you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation; either version 2 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program; if not, write to the Free Software
-- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

hanoi :: Integer -> IO ()
hanoi = putStr . hanoiShower . hanoiMover 'a' 'b' 'c'

hanoiMover :: a -> a -> a -> Integer -> [(a, a)]
hanoiMover source using dest n
    | n == 0 = []
    | n == 1 = [(source, dest)]
    | otherwise = hanoiMover source dest using (n-1)
                  ++ hanoiMover source using dest 1
                         ++ hanoiMover using source dest (n-1)

This version is essentially a higher-order functional composition of my original hanoi.hs mentioned at the beginning of this blog post.

Thanks to augustss for pointing out this observation; I’ll try to write my functions in a more functional style hereinafter.

P.P.S.

Another commentator, Jedai, has been generous enough to indicate that my version of hanoi can be rewritten entirely without parentheses. Per Jedai’s suggestion, I have rewritten hanoi.hs as follows:

-- hanoiHelperRichSansParens.hs
-- Haskell function to compute the Towers of Hanoi problem recursively, 
-- using a helper function and the '$' operator to take a function and 
-- an argument, and accordingly save typing parentheses, rewritten
-- without parentheses
-- 
-- Originally based on hanoiHelperRich.hs, a Haskell function to compute
-- the Towers of Hanoi problem recursively, by Benjamin L. Russell,
-- dated April 13, 2009, at 20:13; 
-- revised based on a comment from Jedai, dated April 18, 2009, at
-- 03:21, (see http://dekudekuplex.wordpress.com/2009/04/13/climbing-the-towers-of-hanoi-with-haskell-style-curry-from-a-monadic-container-while-sparing-the-sugar/#comment-57),
-- in response to my blog post "Climbing the Towers of Hanoi with
-- Haskell-style Curry from a Monadic Container (While Sparing the 
-- Sugar!)," dated April 13, 2009, at 19:59 (see
-- http://dekudekuplex.wordpress.com/2009/04/13/climbing-the-towers-of-hanoi-with-haskell-style-curry-from-a-monadic-container-while-sparing-the-sugar/)
--
-- Copyright(C) April 20, 2009, at 19:25, 
-- by Benjamin L. Russell
-- 
-- This program is free software; you can redistribute it and/or modify
-- it under the terms of the GNU General Public License as published by
-- the Free Software Foundation; either version 2 of the License, or
-- (at your option) any later version.
--
-- This program is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-- GNU General Public License for more details.
--
-- You should have received a copy of the GNU General Public License
-- along with this program; if not, write to the Free Software
-- Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

hanoi :: Integer -> IO ()
hanoi n = hanoiHelper n "A" "B" "C"

hanoiHelper :: Integer -> String -> String -> String -> IO ()
hanoiHelper n source dest using
                | n == 1 = putStrLn $ "Move disc from " ++ source ++ " to " ++ dest ++ "."
                | otherwise = hanoiHelper (n-1) source using dest
                                      >> hanoiHelper 1 source dest using
                                           >> hanoiHelper (n-1) using dest source

March 26, 2009

A Correction to “Too Much is Not Enough: The Revised^6 Report on the Algorithmic Language Scheme (R6RS)”

Filed under: Programming Language Theory, Scheme — Benjamin L. Russell @ 10:20 pm

In my previous blog entry, I had listed Jacob Matthews and Robert Bruce Findler as having been on the list of authors for the Revised^6 Report on the Algorithmic Language Scheme, but last month, Robby Findler pointed out in a comment that neither of these two authors had been responsible for decision-making; rather, they had only been authors of the formal semantics thereof.

Therefore, I am removing them from the list of people responsible for decision-making for R6RS.

My apologies for the late correction.

February 26, 2009

Too Much is Not Enough: The Revised^6 Report on the Algorithmic Language Scheme (R6RS)

Filed under: Programming Language Theory, Scheme — Benjamin L. Russell @ 9:00 pm

On September 26, 2007, the Revised^6 Report on the Algorithmic Language Scheme was released. Normally, a new standard for Scheme is greeted with applause, but this time, the revision sparked a great controversy. The Scheme community became divided between pro-R6RS and anti-R6Rs factions, and some members even decided to leave programming language theory completely at least partly because of this change. As Nils M. Holm has mentioned,

A lot has happened since the release of the previous edition of Sketchy LISP. The Six’th Revised Report on the Algorithmic Language Scheme (R6RS) was ratified and Scheme is no longer the language it used to be.

Why not? Consider this graph provided by Grant Rettke. Looking at the graph, it is remarkable how there is a sudden break in continuity between most of the members of the list of authors for R2RS – R5RS, and those for R6RS. Specifically, the following authors were in common between R2RS – R5RS, but suddenly disappeared in R6RS:

Chris Hanson
Chris Haynes
Dan Friedman
David Bartley
Don Oxley
Eugene Kohlbecker
Gary Brooks
Hal Abelson
Jonathan Rees
Kent Pitman
Mitchell Wand
Norman Adams (or Norman I Adams IV, assuming this name references the same author)
Robert Halstead
William Clinger

Instead, we suddenly see the appearance of the following authors for R6RS, who had never appeared in any of R2RS – R5RS:

Anton Van Straaten
Jacob Matthews
Matthew Flatt
Michael Sperber
Robert Bruce Findler

In fact, the only person common in the lists between R3RS – R5RS (according to the chart, he was not an author for R2RS) and R6RS is Kent Dybvig.

What happened to all the previous authors, and who are these new authors?

Apparently, at least some of the previous authors quit the board, possibly because of a disagreement over the new standard. As for the new authors, I know of both Matthew Flatt and Robert Bruce Findler from PLT Scheme. They are members of the PLT Research Group, and they regularly post on the plt-scheme mailing list. (They have both been very helpful on many occasions, so I have no qualms against them personally; I am just trying to analyze their motivations for R6RS.) They are both also among the authors for HtDP. They are also among the co-authors for the influential paper “The Structure and Interpretation of the Computer Science Curriculum.” According to that paper (see pp. 6-7, in section “3.1 Functional and object-oriented programming”), they write as follows:

Functional programming and object-oriented programming differ with respect to
the syntax and semantics of the underlying languages. The core of a functional lan-
guage is small. All a beginning programmer needs are function definition, function
application, variables, constants, a conditional form, and possibly a construct for
defining algebraic types. In contrast, using an object-oriented language for the same
purposes requires classes, fields, methods, inheritance in addition to everything that
a functional language needs….

Using a functional language followed by object-oriented language is thus the
natural choice. The functional language allows students to gain confidence with
program design principles. They learn to think about values and operations on
values. They can easily comprehend how the functions and operations work with
values. Better still, they can use the same rules to figure out why a program pro-
duces the wrong values, which it often will. Teaching an object-oriented language
in the second course is then a small shift of focus….

I.e., Flatt and Findler believe that teaching functional programming should be followed by teaching object-oriented programming. This concept, however, is not shared by all educators. In particular, Paul Hudak, one of the designers of the Haskell programming language, writes on the Haskell-Cafe mailing list as follows on a related issue (see “[Haskell-cafe] a regressive view of support for imperative programming in Haskell“:

All of the recent talk of support for imperative programming in Haskell
makes me really nervous. To be honest, I’ve always been a bit
uncomfortable even with monad syntax. Instead of:

do x <- cmd1
y <- cmd2

return e

I was always perfectly happy with:

cmd1 >>= \x->
cmd2 >>= \y->

return e

Functions are in my comfort zone; syntax that hides them takes me out of
my comfort zone.

In my opinion one of the key principles in the design of Haskell has
been the insistence on purity. It is arguably what led the Haskell
designers to “discover” the monadic solution to IO, and is more
generally what inspired many researchers to “discover” purely functional
solutions to many seemingly imperative problems. With references and
mutable data structures and IO and who-knows-what-else to support the
Imperative Way, this discovery process becomes stunted.

Well, you could argue, monad syntax is what really made Haskell become
more accepted by the masses, and you may be right (although perhaps
Simon’s extraordinary performance at OSCOM is more of what we need). On
the other hand, if we give imperative programmers the tools to do all
the things they are used to doing in C++, then we will be depriving them
of the joys of programming in the Functional Way. How many times have
we seen responses to newbie posts along the lines of, “That’s how you’d
do it in C++, but in Haskell here’s a better way…”.

While Hudak contrasts imperative vs. functional, as opposed to object-oriented vs. functional, the basic issue is the same: Should functional programming be treated as a separate paradigm by itself (as “the Functional Way” vs. “the Imperative Way” (or maybe even “the Object-oriented Way”)), or should it be combined with a different paradigm (be it imperative or object-oriented, the issue is the same). Or even (according to some educators), should there even be an issue of “paradigm” at all?

This is an issue of basic teaching philosophy. As an example of the third viewpoint, Shriram Krishnamurthi writes on the plt-scheme mailing list as follows (see “[plt-scheme] Re: More pedagogic stuff“:

Besides the simplistic
reasoning, I am opposed to the whole idea of programming languages (or
even much of programming) being organized around “paradigms”. Here is
a short and intentionally somewhat provocative article that I recently
wrote about this:

http://www.cs.brown.edu/~sk/Publications/Papers/Published/sk-teach-pl-post-linnaean/

In the referenced page, Krishnamurthi writes as follows:

Programming language “paradigms” are a moribund and tedious legacy of a
bygone age. Modern language designers pay them no respect, so why do our courses
slavishly adhere to them? This paper argues that we should abandon this method of
teaching languages, offers an alternative, reconciles an important split in programming
language education, and describes a textbook that explores these matters.

In the referenced paper (see “Teaching Programming Languages in a Post-Linnaean Age,” Krishnamurthi writes (see p. 1, third paragraph):

Most books
rigorously adhere to the sacred division of languages into “functional”, “imperative”, “object-oriented”,
and “logic” camps. I conjecture that this desire for taxonomy is an artifact of our science-envy from the
early days of our discipline: a misguided attempt to follow the practice of science rather than its spirit.
We are, however, a science of the artificial. What else to make of a language like Python, Ruby, or Perl?
Their designers have no patience for the niceties of these Linnaean hierarchies; they borrow features as they
wish, creating melanges that utterly defy characterization.

Ultimately, it seems that this issue boils down to a matter of taste.

This is usually not a problem, so long as this taste is confined to a particular implementation. The problem with R6RS is that it is not just an implementation; it is a standard that can affect all implementations. R6RS, compared to previous revisions, makes a dramatic number of changes to the Scheme standard. Each one, individually, is not such a big problem; it is the entirety of all these changes taken together without sufficient discussion that is the problem.

The members of the board had to vote for all the changes taken together, rather than each one separately. A standard usually should require a consensus; however, this consensus was not really achieved in the case of R6RS (read through the reasoning behind both the “yes” and “no” votes in the “R6RS Ratification Vote,” and you will see the scope of the division).

Some of the main points of this division seem to be the following (apologies to those whose votes are not listed below; I have listed only the votes of names that sounded familiar to me, but there were many other significant votes, which were omitted not because of lack of significance, but because of lack of space and time):

the module system (see votes by Chris Hanson and Taylor R Campbell)

SYNTAX-CASE (see the vote by Chris Hanson above)

the switch to case sensitivity (see the vote by Jonathan Rees)

defining too many features as part of the language, rather than on top of the language (see the vote by Nils M Holm)

lack of time to change the draft to meet the deadline of the ratification draft (see the vote by Shiro Kawai)

too long description of the specification (thrice the previous one); restrictions on implementation design that impede future development of the language in the areas of numbers, text manipulation & Unicode, and the module system; overreach of the module system’s definition (see the vote by Taylor R Campbell)

Even some of the “yes” votes list “imperfections” that “troubled” them:

apparent inflexibility of the library system, and absence of a facility analagous to Common Lisp’s reader macros (see the vote by Alexey Radul)

In sum, it seems that R6RS was at least partly motivated by a perceived need by at least some of the new authors for a standard of Scheme that would help ease the transition between the functional and object-oriented paradigms. Apparently, at least some of the authors perceived that adding more features to the core language, instead of defining a small core language and adding features on top of the language, would help with this transition.

Another problem was lack of time. R6RS seems rather rushed compared to previous specifications. There probably should have been more discussion before finalizing the specification. In particular, there should have been some means of voting on each potential feature separately, instead of on all of them together. It is quite possible for each feature alone to be quite useful, but for the combination of all of them put together suddenly at once to be quite problematic.

(This entry an adaptation of my post “Re: History of Scheme People,” dated “Thu, 26 Feb 2009 15:45:33 +0900,” on the USENET newsgroup comp.lang.scheme.)

January 19, 2009

Motivating Category Theory for Haskell for Non-mathematicians

Filed under: Category Theory, Haskell — Benjamin L. Russell @ 8:50 pm

The Issue:  Why Study Category Theory?

Two days ago, there was another interesting post by Andrzej Jaworski, entitled “[Haskell-cafe] Mathematics for Uninterested,” dated “Sat, 17 Jan 2009 05:15:25 +0100,” on the Haskell-cafe mailing list on the Haskell programming language, offering an explanation of why mathematics is important for learning Haskell. In particular, he wrote:

MATHEMATICS_IS_ABOUT_SIMPLIFYING_THINGS.

MATHEMATICS = MATHEMATICAL_KNOWLEDGE + FORMAL_METHODS_of_THINKING

You need only the second part for Haskell.

He then continued:

… Category Theory generalizes our mathematical experience along the line that we know
more that we can prove.

Nevertheless, many non-mathematicians continue to have much difficulty in motivating study of category theory.  Some regard it as too theoretical.  Others regard it as too abstract.  And still others regard it as too dry.

While I cannot argue too much against the second point (indeed, category theory has been nicknamed “abstract nonsense,” although not necessarily pejoratively), and cannot truly deny the first for those who do not prefer theory, I have something to say from personal experience, as a student who overcame math phobia to appreciate beauty in mathematics, about the third.

Each student learns differently.  Some learn best by seeing examples.  Some learn best by studying theory.  Still others learn best by actually solving problems.

Case Study:  My Personal Experience in Approaching Mathematics

Several years ago, when I was still living in Manhattan, I read an article in a local newsletter about how different students had different learning styles in primary (and possibly secondary) school, and how putting different students into the wrong type of class inhibited their learning.  In particular, the article discussed how most students in the classes could be grouped into either of two different learning styles:  those who learned by example, and those who learned by theory.  Students who learned by example needed to see concrete examples of the mathematics in action in order to infer the reasoning; conversely, students who learned by theory learned best by studying the general theory from which to deduce specific applications.  Neither style was correct; they were simply different, and suited different types of students.

The same can probably be said for learning category theory.  Personally, I learn best from a combination of first studying theory, and actually applying the theory to incrementally more sophisticated problems.  Without the theory, I cannot understand the examples, but without actually applying the theory, I tend to forget about it very quickly.  Applying the theory, especially in a creative manner, makes it stick in my brain.

However, although I eventually was able to major in computer science, getting there was a long, arduous, grueling odyssey.  The most difficult part was conquering a required course in the design and analysis of algorithms.  I had come to college from a self-educated background, studying on my own since fifth grade in elementary school without any teacher, using only books, because of family circumstances.  In the end, I wound up studying throughout my secondary school years at home.

However, my interests had been in writing and multimedia applications.  As a result, prior to coming to college, while I had had extensive practice in writing, I had had minimal exposure to mathematics.  My first exposure to college-level mathematics resulted in an outbreak of math phobia.

Nevertheless, I still wished to study something related to computers in college.  Because the school happened to be a research-oriented school with a theoretically-oriented faculty, the only computer-related topic available happened to be computer science, and the focus was on theory.

Theoretical computer science happens to have a heavy focus on mathematics.  However, I happened to be struggling with math phobia.  I needed to find something interesting about mathematics if I was to remain in computer science.

Fortunately, one very generous mathematics student offered to tutor me gratis once per week over one summer.  He gave me what he termed “an elementary course from an advanced viewpoint.”  He started out from axiomatic set theory, focusing on lemmas and proofs of theorems, and gave me an intuitive grasp of such elementary concepts as set comprehension, power sets, ordered pairs, and Cartesian products.  I later learned about Russell’s paradox, and found it rather interesting.  At the end of the summer, I was able to take and pass an undergraduate course at my college in axiomatic set theory.  Then I discovered Gödel, Escher, Bach: an Eternal Golden Braid, was fascinated, and the next semester, I proceeded on to pass recursive function theory.  Then I took a course in automata theory.  Then I took a course (albeit in the philosophy department) in meta-logic.  Then I took a graduate-level course in recursion equations (domain theory).

After then taking a year’s leave of absence to study discrete mathematics for computer science outside of my official curriculum, I was able to return and eventually pass the required course in the design and analysis of algorithms.  The next semester, I then took an additional course in advanced algorithms, which turned out to be significantly less of a hassle than the first course in algorithms (the exercises were more abstract, but the workload was considerably less, so I went to the library and did some research for each problem set).  One of my favorite courses later wound up being one in formal semantics of programming languages, where I was exposed to this fascinating creature, the lambda calculus.

Yet somehow along the way … I forgot how to program well.  I had to struggle through another required course in systems programming in C, and this time acquired programming phobia, which I had not had before.  Oh dear, I had forgotten how to use pointers in coding linked lists!  I only recovered after auding a course using Scheme to teach computer science under a professor who also taught Haskell.  Then I audited another course under this professor in Haskell.  Thank goodness!  No more pointers, no more memory management!  Instead, higher-order functions!

A Short, Thin, Elementary, Yet Influential Book and Its Approach

One of the most influential books in my struggle to conquer algorithms was a most unassuming, very thin, informal book called Naive Set Theory, by Paul R. Halmos.  This book had no explicit exercises marked as such, and the treatment was almost conversational in tone, yet almost every sentence was an implicit exercise in abstract thinking.  Each chapter was only three to five pages long, with almost no equations, and the entire book was only about a hundred pages long.  The covers together were barely thicker combined than the text.  Yet it still covered the following topics:

the Axiom of Extension
the Axiom of Specification
unordered pairs
unions and intersections
complements and powers
unordered pairs
relations
functions
families
inverses and composites
numbers
the Peano Axioms
arithmetic
order
the Axiom of Choice
Zorn’s Lemma
well ordering
transfinite recursion
ordinal numbers
sets of ordinal numbers
ordinal arithmetic
the Schröder-Bernstein Theorem
countable sets
cardinal arithmetic
cardinal numbers

Imagine covering an entire topic in mathematics with almost no equations, and no explict exercises, and only three to five pages per chapter!  This was easily one of the shortest books I ever read in college, yet one of the most influential: Not only did it resolve the math phobia, but it also imbued me with an appreciation for elegant proofs, and hence for beauty in mathematics.

A Related Topic:  Computability Theory

Another experience that helped in my endeavor was an independent seminar with one researcher in automata theory (it was actually this experience which eventually led to my auditing a course in this subject).  My employer for one summer was a programmer who had graduated from the same university.  He helped me set up an independent once-a-week seminar (actually a private tutorial) covering the book Elements of the Theory of Computation (first hardcover edition), by  Harry R. Lewis and Christopher Papadimitriou.  I would read the book, collect questions, and bring them to the seminar every week.  This book covered such topics as computability, the diagonalization lemma, unsolvability, finite state automata, Turing machines, mathematical logic, and other related topics.  What I liked about this book was the aspect that, although it gave a very formal, abstract treatment to computability theory, otherwise it did not assume any specific knowledge of any area of mathematics.

I think that a book similar to the above two for category theory would be highly useful, at least for students with the same learning pattern as myself.  For such students, I believe that what is most important is the intuition; the formalism can come later.  Mathematics is not about formalism; it is about insight: insight into patterns and structure, and the beauty that comes with such patterns.  While a book with many exercises would probably be necessary at some point, for a first course in category theory for a non-mathematician, an approach similar to Halmos’ would seem most effective.

Set Theory vs. Category Theory:  The Conflict in Approach

The only problem with recommending a book on set theory to a student of category theory is that the approaches do not overlap. Set theory studies what is inside sets; category theory studies what is outside them.

Specifically, set theory starts out by defining the empty set, then defining basic axioms, and then defining certain operations on sets and functions operating on sets in leading up to lemmas and theorems about properties of sets.  However, the focus is essentially on what is inside the sets.

However, category theory ignores what is inside, and instead focuses on structure-preserving mappings (known as “homomorphisms”) between different sets having a specific structure.  These homomorphisms lead to theorems about the structure of the sets.  Thus, it is these mappings in category theory that are the focus, not explicitly what is inside the sets.  This is the origin of the nickname of “abstract nonsense” in reference to category theory.

Therefore, there is a risk that exposing a new student of Haskell first to set theory could actually confuse the student upon subsequent exposure to category theory. Therefore, I might substitute instead an elementary introduction to category theory with an informal, somewhat conversational tone, such as the following:

Conceptual Mathematics: A First Introduction to Categories
by F. William Lawvere and Stephen Hoel Schanuel
Cambridge: Cambridge University Press, 1997
As discussed in my blog entry “Learning Haskell through Category Theory, and Adventuring in Category Land: Like Flatterland, Only About Categories,”this is an elementary introduction to category theory, with a conversational tone, using such examples as Galileo and the flight of a bird for illustration.

Advanced Topics for Further Study

Once the student has had a taste of category theory, there are two possible alternatives:  Either broaden the topics, or deepen them.  The latter has already been discussed in my above-mentioned blog post “Learning Haskell through Category Theory, and Adventuring in Category Land: Like Flatterland, Only About Categories“; here I discuss the former.

As mentioned above, automata theory is one possibility for advanced study for students who have acquired a taste for theory and wish to explore more:

Elements of the Theory of Computation (first hardcover edition)
by Harry R. Lewis and Christopher Papadimitriou
New Jersey: Prentice Hall, Inc., June 1981
As discussed above, a formal, rigorous treatment of the theory of computation. The book assumes the ability to reason abstractly, but does not assume familiarity with any specific areas of mathematics. Most readers either love or hate this book, depending on whether they are theoretically-inclined.

Another possibility is recursive function theory:

Computability, Complexity, and Languages, Second Edition: Fundamentals of Theoretical Computer Science (Computer Science and Scientific Computing)
by Martin Davis, Ron Sigal, and Elaine J. Weyuker
San Francisco, California: Morgan Kaufmann Publishers, 1994
This book, like Elements of the Theory of Computation, is formal, but does not assume any specific background in mathematics.  One of the authors of this book, Ron Sigal, used to be my professor for set theory, recursive function theory, and recursion equations (domain theory) in college. He was friendly and approachable, had an exceptionally clear teaching style, and usually spent the first class in an interesting lecture discussing the historical background of the subject.

Still another possibility is my nemesis, the design and analysis of algorithms:

Compared to What?: An Introduction to the Analysis of Algorithms (Principles of Computer Science Series)
by Gregory J.E. Rawlins
New York: W. H. Freeman, November 15, 1991
This was my favorite book on algorithms in college, and played an instrumental role in allowing me to pass this subject. Without this book, I probably would have had a much more difficult time in clearing this subject. This book approaches each algorithm from the perspective of how the algorithm is arrived at, rather than just handing the solution ready-made, interspersing numerous entertaining illustrations and quotes from works by Lewis Carroll. I further wrapped this book in kaleidoscopic bookwrap to strengthen the feeling of being in algorithm wonderland while reading this book, and the effect worked like a charm: I learned the algorithms, finally passed the course, and completed the major.

Introduction to Algorithms, Second Edition
by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein
Cambridge, Massachusetts: The MIT Press, September 1, 2001
This is the second edition of a classic work in the subject. It provides a rigorous theoretical treatment of algorithms. The first edition of this book forced the author of this blog to undertake a seemingly never-ending journey through set theory, discrete mathematics, recursive function theory, automata theory, meta-logic, and recursion equations (domain theory) in order to clear the topics covered by this book. I haven’t read the second edition yet, but the first edition was very complete and thorough, and was basically a mathematical encyclopedia on algorithms, complete with a brief introduction to prerequisite topics, detailed development of proofs, numerous exercises that led to other parts of the main text, numerous starred exercises of special difficulty (my professor hired a student who did all the starred exercises on his problem sets as a summer research intern), detailed pseudocode, analysis of the pseudocode, proofs of the analysis of the pseudocode, and highly detailed illustrations covering details of the algorithms and their proofs. Of course, this book has no solutions. Hey, who needs solutions? Nobody said you needed sleep, right?

The Art of Computer Programming (currently, Volumes 1 to 3 and Fascicles 0 to 4 of Volume 4, with Volume 5 planned for release in 2015, and Volumes 6 and 7 planned for later release)
by Donald E. Knuth
Reading, Massachusetts: Addison-Wesley, 1997 to 2006 and pending
The original classic work on the design and analysis of algorithms.  Still in progress.  Originally begun in 1962, written by compiler expert Donald E. Knuth.  Volume 1 covers fundamental algorithms, volume 2 covers seminumerical algorithms, volume 3 covers sorting and searching, volume 4 covers combinatorical algorithms, volume 5 is scheduled to cover syntactic algorithms, volume 6 is scheduled to cover the theory of context-free languages, and volume 7 is scheduled to cover compiler techniques.  Exercises range in difficulty from “warm-up” exercises to research problems.

One last topic is algebra, for which there is reputedly a very approachable work focusing on a limited scope of topics and exploring them to great depth:

Topics in Algebra
by I. N. Herstein
Xerox Corporation, 1975
Although I haven’t read this book, it has earned very high reviews for exposition and flow. The book is reputedly designed so that a few basic ideas are focused on, and problems in them explored, so that each new topic flows seemlessly from the previous one.

Conclusion

Let students take the book on the subway, train, or bus, and have them think about the categories in their sleep!  Gradually, their curiosity will be piqued, and they will understand categories.  Then they can read a second book with many exercises.  But while they still have curiosity, let them at least visualize and juggle the categories in many ways in their heads before forcing them off!  The more easily they can digest their first book, the more likely their curiosity will stay.

January 16, 2009

Learning Haskell through Category Theory, and Adventuring in Category Land: Like Flatterland, Only About Categories

Filed under: Category Theory, Haskell — Benjamin L. Russell @ 8:55 pm

Two days ago, there was an interesting post by Andrzej Jaworski, entitled “[Haskell] Teach theory then Haskell as example,” dated “Wed, 14 Jan 2009 04:37:33 +0100,” on the Haskell mailing list on the Haskell programming language, recommending that Haskell be taught “on most abstract terms in a framework of higher order logic, types and CT right from the start.”  While I agreed with the gist of his post, I hadn’t found an appropriate publication on category theory that addressed the subject at the proper pace, level, and perspective.  Most publications did not explain enough detail, assumed too many topics not covered, or did not explain the concepts in a manner which would allow me to form a visual framework of reference in my mind.  In my response, I listed the following publications on category theory, a branch of mathematics which forms a theoretical framework for Haskell. (Author names and dates have been added, explicit URL’s replaced by hyperlinks in titles, and descriptions expanded, for referential convenience. In addition, I have added one additional book entry for cross-referential purposes.):

Category Theory Books:

Conceptual Mathematics: A First Introduction to Categories (Paperback)
by F. William Lawvere (author) and Stephen Hoel Schanuel (author)
Cambridge: Cambridge University Press, 1997
An elementary introduction to category theory

An introduction to category theory in four easy movements
by A. Schalk (author) and H. Simmons (author)
Manchester: A. Schalk and H. Simmons, October – December, 2005
A somewhat informal, reportedly in-depth introduction to category theory, which some students have described as being intricately layered. Personally, I like Harold Simmons style, because it seems to have more personality than that of many other authors on category theory. Sometimes he is actually funny; e.g., he writes (see p. 6),

“In the original examples of categories the arrows were morphisms which were then called homomorphism, and it wasn’t realized that this family could be very large. (Some out and out category theorists still don’t realize the significance of this. On the other hand, some off the wall set theorists don’t realize the significance of category theory.)”

I’ve heard of the UNIX wars and the editor wars, but now we have the category theory vs. set theory wars! The more some things change, the more they stay the same….

Category Theory by Magic: A short introduction to the basics of category theory
by Harold Simmons
Manchester: Harold Simmons, November 29, 2008
A somewhat more advanced text than An introduction to category theory in four easy movements, intended specifically for “postgraduate students,” but written in an exceptionally clear style.

Toposes, Triples and Theories
by Michael Barr and Charles Wells
Michael Barr and Charles Wells, 2000
An often-referenced introduction to category theory, intended for graduate students, reportedly
discussing monads as “triples.”  This book is a sequel to Category Theory for Computing Science, by the same author. This is one of the publications listed in the bibliography by Simmons in Category Theory by Magic: A short introduction to the basics of category theory.

Category Theory for Computing Science (Third Edition)
by Michael Barr and Charles Wells
Hertfordshire, U.K.: Prentice Hall International (UK) Ltd., 1999
Prequel to Toposes, Triples and Theories, “written specifically to be read by researchers and students in  computing science.”    Apparently, only available in dead-tree format.  Can be ordered from Centre de recherches mathématiques, or by e-mail to crmbooks@crm.umontreal.ca.  Often cited in the literature.

Categories and Computer Science
by R. F. C. Walters
Cambridge: Cambridge University Press, August 1992
Reportedly a straightforward introduction to category theory, with many examples from computer science

Arrow, Structures and Functors – The Categorical Imperative (no hyperlink available)
According to the HaskellWiki page on category theory, an out-of-print book covering monads and the Yoneda lemma, with very little prerequisite knowledge

Category Theory
by Steve Awodey
Oxford: Oxford University Press, 2006
This text aims to minimize mathematical prerequisites in providing an introduction to category theory for students in such topics as computer science, logic, linguistics, cognitive science, or philosophy, who may not necessarily be working (or aspiring) mathematicians. Students are not assumed to have much background in mathematics beyond a course in discrete mathematics and some calculus or linear algebra or logic.

The book starts out with a few basic examples of posets and monoids, and then develops them in further detail. While the book assumes little in mathematical prerequisites, it does not sacrifice rigor, and covers categories, functors, natural transformations, equivalence, limits and colimits, functor categories, representables, Yoneda’s lemma, adjoints, and monads. Optional topics covered include cartesian closed categories and the lambda calculus. However, 2-categories and monoidal categories are purposely excluded, and toposes are not covered in any depth.

Two aspects that I like about this book are the author’s style of starting out with examples from and references to set theory (although it is true that arrows, not objects, are the focus in category theory, many students come from a set-theoretical background, and references to set theory can ease the transition), and offering occasional examples from computer science. For instance, Example 10 on page 11 provides a case in which a functional programming language L is given and an associated category is shown where the objects are the data types of L, the arrows are the computable functions of L (“processes,” “procedures,” and “programs”), the composition of two programs, f mapping X to Y, and g mapping Y to Z, “is given by applying g to the output of f,” and the identity is the “do nothing” program. Such examples help tie the subject to computer science (in particular, to functional programming), and help render the subject more appealing to students of functional programming.

Category Theory Lecture Notes:

Category Theory Lecture Notes for ESSLLI
by Michael Barr and Charles Wells
Michael Barr and Charles Wells, 1999
Condensed version of Category Theory for Computing Science, discussing category theory from a computer science perspective

A Gentle Introduction to Category Theory – the calculational approach
by Maarten M. Fokkinga
Amsterdam: M. M. Fokkinga, 1992 (version of June 6, 1994)
Another set of lecture notes referenced on the HaskellWiki page on category theory

Categorical Programming with Examples in Haskell:

Categorical Programming with Inductive and Coinductive Types
by Varmo Vene
Tartu, Estonia: Varmo Vene, 2000 (Ph.D. dissertation)
A thesis on categorical programming, exploring inductive and coinductive types, and several programming constructs related to them in Haskell

In conclusion, I stated as follows:

I would believe that having specific Haskell code to help interact
with the categorical examples would help to motivate study of the
abstract theory for many programmers.  One problem that many people
have with studying abstract theory in isolation is that they often
tend to lose motivation unless they can see how the theory directly
relates to and influences the semantics and data structures in the
code.  Having specific examples of Haskell code to tie together
immediately with the abstract theory would most likely help to
motivate and maintain interest.

Then I discovered a similar thread in the USENET newsgroup comp.lang.haskell which had originated two days earlier, entitled “Book recommendations on underlying theory,” dated “Sun, 11 Jan 2009 15:11:40 -0800 (PST),” in which a reader nicknamed “grimey” was asking for recommended readings on the theory behind Haskell.  Grimey described himself as a programmer who had “studied physics at the undergrad level,” and whose mathematical background included “Diff & Integ. Calc, Linear Algebra, Diff Eq, Advanced (Vector) Calc, [and] Complex Analysis.”  In particular, he added,

Although the Vector Calc class was particularly difficult
for me and sort of “took the wind out of my sails” and leaving me
adrift on the sea of mathematics for a long time.  I always wanted to
continue on with more abstract math, like algebra and group theory, to
gain understanding of the beauty behind quantum physics; but for
various reasons, that didn’t happen.  Anyway years later I still use
Linear Algebra and Diff Eq quite frequently for work, and have a good
practical engineer’s grasp of those topics.  I use advanced calculus
less frequently and I use quaternions for very practical, non-
theoretical way for rotation sequences in 3D.  I am an expert C/C++
programmer.

He then mentioned the following books as two that had “piqued [his] interest,” before asking for further suggestions:

Topics In Algebra
by I.N. Herstein
Xerox Corporation: 1975

Lambda-Calculus and Combinators
by J. Roger Hindley and Jonathan P. Seldin
Cambridge: Cambridge University Press, 2008

In my response linking the two threads, I mentioned Andrzej Jaworski’s response to my earlier posting in the first thread, in which he had replied in part that:

[T]here are many very good
articles addressing specific issues of Haskell's theoretical foundations (e.g.
http://www.cs.ut.ee/~varmo/papers/thesis.pdf).  They however always assume more than they target to
explain making student turn around them like a dog not knowing which ball to catch first.

I.e., the problem was that many such papers depended on parts of other papers, which either depended on other parts of the first paper, or on parts of still other papers.  Then I mentioned a response to this issue that I had received in a private e-mail message from a reader of the first thread, in which that reader had written that:

… the problem “comes from trying to use academic papers as textbooks, a
purpose for which they’re not usually designed.”

I then added that this was the gist of the problem, and, as a possible solution, added a dead-tree-based book that I had read about earlier (already mentioned above):

Conceptual Mathematics: A First Introduction to Categories (Paperback)
by F. William Lawvere and Stephen Hoel Schanuel
Cambridge: Cambridge University Press, 1997
An elementary introduction

I added:

This book is reputed to be interesting even to non-mathematicians at a
philosophical level.  I myself have purchased a copy.

Then, on an afterthought, in an additional post immediately afterwards, I added that what would be even better would be a mathematical storybook (separate links to the storybooks hyperlinked in the following quoted portion):

What would be especially interesting would be an introductory storybook on category theory:  a book similar to, say, Flatland, by Edwin Abbott Abbott, or even Flatterland, by Ian Stewart.  These books would elevate category theory out of the realm of dry equations into a story which would keep me awake with suspense reading through the night.

Yes!  What we need is Category Land:  Like Flatterland, Only About Categories.  If only a book were so.  Any suggestions?

Next Page »

Blog at WordPress.com.