[SICP 7] Notes on SICP Chapter 1.1 “The Elements of Programming” - Sections 1.1.6, 1.1.7, 1.1.8

1.1.6 Conditional Expressions and Predicates

Enter conditionals

This section is about branching in program flow. The data type that stores whether a condition is satisfied or not is boolean. Scheme has boolean values #t and #f and built-in names true and false. It’s up to the user whether to use the names or their values directly.

This chapter only introduces numerical comparison procedures <, = and >. Note that = works on numbers only, values given should satisfy number? query. Procedures for other kinds of equivalence will be introduced later.

Case analysis is done via cond (short for conditional) special form ****of which syntax is as follows:

(cond (<p1> <e1>)
      (<p2> <e2>)
      ...
      (<pn> <en>)
     [(else <e_z>]) 
    ; here I use square brackets to indicate optionality
    ; However, they are valid parantheses in Racket with identical functionality,
    ; for the purpose of increasing readability of Scheme code.

cond is followed by clauses in parentheses. (<p_i> <e_i>) where <p_i> are called predicates whose values is a boolean, and <e_i> is the consequent expression that is going to be evaluated and returned if p_i is true. A predicate can be both a procedure that return boolean or an expression that evaluates to boolean.

cond is a special form like define because it does not evaluate all of its “arguments”. It computes predicates sequentially and computes a consequent expression only if its predicate is #t, otherwise cond ignores it. If none of the predicates is #t and there is no else the returned value is undefined. If there is an optional else clause its consequent expression is evaluated and returned.

According to Scheme’s philosophy of minimalism a special form should only be introduced if it is absolutely necessary. Such as case is when the general rule does not work. There are exercises which demonstrate well why general rule cannot produce the right thing.

A cond usage example:

(define (sign num) (cond ((> num 0) 1)
                         ((= num 0) 0)
                         (else -1)))

For flows that can only split into two branches there is a simpler special form if with following syntax

(if <predicate> <consequent> <alternative>)

Again it’s a special form, because only either <consequent> or the <alternative> expression will be evaluated according to the <predicate>’s value, which is different than the general combination evaluation rule.

Example: (define (abs num) (if (num (< num 0) (- x) x)) (here - has only one parameter, it’s the unary negation procedure)

Scheme also has the standard “logical composition operations” to construct compound predicates and compute boolean logic, and, or and not with following syntax

(and <e_1> ... <e_n>)
(or <e_1> ... <e_n>)
(not <e>)

and and or are special forms, i.e. they do not necessarily evaluate all expressions. both and and or start evaluating from left-to-right. and stops if any e_i is #f and or stops if any of them is #t, no need to go further. and and or not being procedures but special forms prevents them from being given as arguments to advanced procedures such as accumulate. 😞 Whereas not is just an ordinary procedure.

Selected exercises

Exercise 1.3

Define a procedure that takes three numbers as arguments and returns the sum of the squares of the two larger numbers.”

I think this is a troll exercise. In the sense that it definitely requires abstractions that the reader didn’t see yet for a satisfactory solution. The student’s mind immediately goes to the general problem of summing squares of M larger numbers out of N numbers, which requires lists, sorting, filtering. 3 is perfect number for this problem: it’s not as trivial as 2 and not too bloated as 4 that’ll make you say “I won’t do this without having a sorting functionality first”.

But it made me lose focus and start thinking about required number of comparisons to sort N items, and to top M items without sorting etc.

I guess the expected solution was something like this, a conditional for each argument potentially being the minimum:

(define (sum-larger-squares a b c)
  (define (ss x y) (+ (square x) (square y)))
  (cond ((and (<= a b) (<= a c)) (ss b c))
        ((and (<= b a) (<= b c)) (ss a c))
        (else (ss b c))))

Against this blatant attempt of trolling, I responded with a counter-trollish, performance-ignoring solution that does 4 square computations:

(define (ss2a x y z)
  (define min-arg (min (min x y) z))
  (define total-squares (+ (square x) (square y) (square z)))
  (- squares-squares (square min-arg)))

Exercise 1.4

Observe that our model of evaluation allows for combinations whose operators are compound expressions. Use this observation to describe the behavior of the following procedure:” (define (a-plus-abs-b a b) ((if (> b 0) + -) a b))

This one is an early show-off to demonstrate that one can conditionally choose the operator of a combination. if does not necessarily has to return a data value, but can return a procedure. If b is negative subtract b from a otherwise add b to a. Nice way to put it.

Exercise 1.5

Ben Bitdiddle has invented a test to determine whether the interpreter he is faced with is using applicative-order evaluation or normal-order evaluation. He defines the following two procedures:

(define (p) (p))
(define (test x y)
  (if (= x 0)
      0
      y))

Then he evaluates the expression (test 0 (p)) What behavior will Ben observe with an interpreter that uses applicative-order evaluation? What behavior will he observe with an interpreter that uses normal-order evaluation? …

This is a questions to grasp the different between two alternative evaluation orders. It’s initially a little bit confusing but a mechanical approach of applying evaluation rules helps. p is an extremely recursive function. Only thing it does is to call itself, nothing else. It could better have been called all-your-stack-belong-to-us.

In applicative-order evaluation operand are evaluated where (p) evaluates to (p) and interpreter gets stuck in a loop of

(test 0 (p))
(test 0 (p))
(test 0 (p))
...

whereas in the normal-order evaluation

(test 0 (p))
(if (= 0 0) 0 (p))
; (= 0 0) evaluates to #t
; no need to evaluate (p)
0

1.1.7 Example: Square Roots by Newton's Method

This is the first of SICP’s famous maths based examples. The goal is to solve a non-trivial problem that requires to combine every lesson learned so far.

It first starts with the discussions of the distinction between mathematical description and actual computation, in other words, the difference between mathematical function and procedures. There is no correlation between how easy to state a function, and how easy it is to compute it.

For example, it’s easy to say that y being the square root of x means that square of x is y (and y is non-negative). The relationship is very clear in the description: $$y = \sqrt{x} \rightarrow y^2 = x$$. However, this does not help with how to compute the square root of a given number. It’s possible to tell given two numbers whether one of them is its square root of another. Or possible to derive further facts about square root relations in general.

In math, definitions are cheap, computations are expensive! (Also, computing square is easy, but going reverse is difficult, similar to the fact that derivative is easy, integral is hard. But that’s a discussion for another time.)

Pseudo-code for a descriptive description of square root problem could be as following. It’d be a miracle if it would be possible to describe a problem in such a concise manner and let the interpreter/compiler produce an efficient solution.

(define (sqrt x)
  (get-y-such-that (and (= (square y) x)
                        (>= y 0))))

Authors also touch the topic of how declarative and imperative descriptions are related, just like mathematics and computer science. The correctness of a program is a declarative statement. However, that proof requires a transition between them. And also they mention the programming language design area of “very high-level languages” where “one actually programs in terms of declarative statements. The idea is to make interpreters sophisticated enough so that, given “what is” knowledge specified by the programmer, they can generate “how to” knowledge automatically.”

In her 2019 "Type-Driven Program Synthesis" presentation Nadia Polikarpova gives examples from the research line of using descriptions to compile imperatives. She uses Haskell types to describe invariants of entities in details. For example, she defines an ordered list type and let the compiler synthesis the logic to add a new element to it while keeping the order. Definitely a very exciting research project! Looking forward for software developers to become obsolete. :-P

The contrast between function and procedure is a reflection of the general distinction between describing properties of things and describing how to do things, …, the distinction between declarative knowledge and imperative knowledge.

Given that we don’t have that much descriptive programming power yet, how can we compute square roots? They use the Babylonian method (a.k.a Heron’s method) for computation, which is an iterative solution, which starts with a guess and at each iteration the guess approaches closer to the actual value.

The idea is based on the fact that arithmetic mean of two numbers is greater (or equal to) their geometric mean. $$\frac{u + v}{2} \geq \sqrt{uv}$$ . (This can be derived from $$0 \geq (u-v)^2$$. Wikipedia has a nice visual explanation. According to this, say if u is our current guess for $$\sqrt{x}$$, and v = x / u, then our next guess is u' = (u+v)/2. We know that u' is between u and v (thanks to it being their mean), and it’s greater than the root (thanks to arithmetic mean being greater than geometric mean aka the root). So, (starting with the first iteration) the estimation is always greater than the goal and it’s always decreasing. Which means the longer we iterate the closer will be to the actual value.

The book does not do any study of how fast this procedure converges. Later we’ll see that this method is a special case of Newton’s method and the general case will make things easier to understand. So, no need to spend extra energy on feasibility of this method, just see that it works pretty fast on a modern computer. :-)

Because this iterative computation is always approximate we cannot wait to reach to the exact value for stopping. We have to manually impose a stopping condition. Authors call it the condition to be “good enough”. We’ll exploit the fact that squaring is much cheaper operation: just square the current guess and see how far away it is from the input number. Goodness condition can be a threshold on the error between square guess and the input.

With this reasoning, we decompose the procedure into following parts: improve is the trick for given a number and a guess improving the guess. good-enough? tells given a number and a guess, whether this guess is good enough. For improve we need to be able to compute the average of u and v , which can be abstracted away too.

Authors start with the higher expression, assuming the sub-parts exist and fill out the details.

(define (sqrt-iter guess x)
  (if (good-enough? guess x) ; if the guess is good enough
      guess ; return the guess
      (sqrt-iter (improve guess x) x)) ; otherwise iterate with improved guess

This is the first recursive expression (to iterate over guesses) and authors didn’t go into the details of a procedure calling itself here. They elaborate on the next in Chapter 1.2, hence I’ll skip it too. It’s enough to be able to read the code and understand the logic of iteration.

Next is to fill the gaps. Good enough was defined as the error in estimation. $$|\epsilon|<T$$, $$\epsilon^2 = u^2 - x$$.

(define (good-enough? guess x)
  (< (abs (- (square guess) x)) 0.001)) ; squaring guess is cheap

And improvement logic is based on averaging

(define (improve guess x)
  (average guess (/ x guess)))

(define (average x y)
  (/ (+ x y) 2))

We can always start with an initial guess of 1 without worrying about divergence.

(define (sqrt x)
  (sqrt-iter 1.0 x))

Chose the decimal number 1.0 instead of exact number 1 because Scheme is capable of doing rational number operations. Without the decimal point, results become rational numbers.

This solution to a non-trivial problem demonstrates that this simple procedural language Scheme is totally capable of doing any numerical computation that other big shot imperative languages can do without having any special syntax/constructs for looping other than just the ability to call a function.

Exercise 1.6 is another example of the importance of special forms. If we implement if conditionals as a procedure it might cause infinite loops for previously working fine recursive expressions. Because even if we hit the “stopping case” interpreter will still evaluate the next step.

1.1.8 Procedures as Black-Box Abstractions

Enter scopes

This section is about information hiding, nested scopes for simplification of sub-procedures. How to decompose a logic to make it easier to be developed by multiple developers, by not cluttering their minds with lots of global definitions.

It starts with a note that sqrt-iter is the first recursive definition, i.e. a definition that is defined in terms of itself. Which feels like circular reasoning and requires a “leap of faith”. But that will be addressed later.

The decomposition of square root computation into procedures reflects one-to-one to the “decomposition of a problem into subproblems”. That’s the natural way to decompose a code. Expressing a process into multiple procedures is not a virtue by itself, as authors say, one can just split them trivially and randomly by grouping successive lines. But that won’t serve any purpose other than making it more confusing.

The goal is to split the logic into meaningful, identifiable tasks that can be used to construct higher-level procedures, which regards them as black-boxes where the users of that reusable components need not to be aware of the implementation details in order to use them. They call these black-boxes procedural abstractions. Roll credits! That’s the title of chapter 1.

The art of this decomposition is the main topic of Stanford Professor John Ousterhout’s “Software Design Studio” course. I wish I could take it! Luckily he shares some of his wisdom in his book “A Philosophy of Software Design”. Definitely worth checking if you want to learn more about this hard to master but important and delicate methodology.

Authors say that “at this level of abstraction, any procedure that computes the square is equally good.” However, I think, pure block-box-ness is an impossible ideal to achieve ideal. For performance aware programs, the space and time complexity of these “box”es, which depend on their implementation, are important.

Their remark on complexity being machine-dependent makes it look like the authors have lived in a time where hardware architecture had more variety. They mention the potential of having a machine with extensive and fast embedded look-ups for logarithmic computations that’ll compute square with following implementation: (define (square x) (exp (double (log x)))) (here, double is implemented via addition) instead of relying on multiplication. I’d say it’d be a bit stretch to have a machine that doesn’t have fast multiplication but fast logarithms. :-)

Another deviation from the ideal is the side-effects. These could be mutating objects, persisting into storage, throwing exceptions etc. Users definitely should know about those.

Authors then introduce the concepts of bound variables and binding. Similar to how define binds the value to in the third argument to second argument, a “procedure definition binds its formal parameters”. The name of the formal parameter can be consistently renamed. (define (square x) (* x x)) and (define (square y) (* y y)) are the same procedures. Names as such, x and y here, are bound variables. Which is in contrast to variables that are not bound, which are called free. Free variables cannot be renamed without changing functionality because its meaning is defined outside of the procedure.

This leads to the concept of scope which indicates the set of expressions (the part of the code) where a binding defines a name. In procedure definitions the body of the procedure is the scope of formal parameters.

This feels intuitive but in JavaScript a variable that is defined anywhere in a file, whether in function f or function g, or outside of any function, has a global scope. In order to provide developers from different backgrounds a more familiar experience let, var identifiers are introduced to the language to make variable scope easier to grasp.

One source of introducing a bug due to human error is capturing a variable by giving a name to a formal parameter that was already bound in outer scope. Try naming the parameter of square as *. :-)

This can be a sneaky bug that can only be detected while following the stack trace in debugger. IDEs should have some visual indicators (that can be turned on/off on demand) that asks “you are bounding a free variable, is that intentional?”.

Using scopes for program decomposition

Next is to utilize the scope concept for structuring code. If these procedures defined for computing square root were part of a computation library, among them, only the sqrt would have any use for its users. Even though the rest are units of computation, they don’t have any reusability value. good-enough? and improve are very specific to square root computation. (Whereas average can be included in any mathematical package.)

Anything that is not need to be known by the user should be hidden from them. The amount of things that has to be kept in mind while developing is an important measure in development velocity. Anytime a developer sees the names of these “detail procedures”, they’ll make a fast consideration of where they belong to, what their meaning are, and whether they have any use for them or not. Which’ll interrupt the available mind space. Be mindful of your users minds!

It’s also very plausible that, since we are considering a library for computation, there would be other algorithms for computing other mathematical functions and they might also have components for iterative improvement and checks for enough-ness. There must be a mechanism to allow functions with the same names exists simultaneously in the code base.

Internal definitions to the rescue! “Such nesting of definitions, called block structure, is basically the right solution to the simplest name-packaging problem” in Scheme. Author’s restructuring of procedure definitions. With this only sqrt is exposed to users. (Stole the ASCII art that indicates the scopes of names from Instructor’s manual)

  [global environment]
    ----------------------------------------------------------------------
(define (sqrt x)                                                       abs
                    ---------------------------------------------        .
  (define (good-enough? guess x)                                x        .
                              ----------------       good-enough?        .
    (< (abs (- (square guess) x)) 0.001)     x            improve     sqrt
                                         guess          sqrt-iter        |
  )                           ----------------                  |        |
  (define (improve guess x)                                     |        |
                         ---------------------                  |        |
    (average guess (/ x guess))              x                  |        |
                                         guess                  |        |
  )                      ---------------------                  |        |
  (define (sqrt-iter guess x)                                   |        |
                          --------------------                  |        |
    (if (good-enough? guess x)               x                  |        |
        guess                            guess                  |        |
        (sqrt-iter (improve guess x) x))     |                  |        |
  )                       --------------------                  |        |
  (sqrt-iter 1.0 x)                                             |        |
                  -----------------------------------------------        |
)  -----------------------------------------------------------------------

Scheme took the idea of block structure from Algol 60. Python allows function definitions inside a function too. But also has its modules and import system for more advanced packaging solutions. Hack has a namespace system too. However, unfortunately, at work, namespaces are barely used (due to some best practices) which results in very long class names where developer prefix their classes such as ProductSurfaceTeamEntityClass.

Enter Lexical Scoping

Combining the block structure with the notion of scope provides an additional functionality, the ability to simplify the auxiliary procedures by making their parameter footprint shorter. The body of an inner definition is in the scope of the free variables of the outer definition. Hence one does not have to explicitly pass arguments of outer functional calls into inner function calls via inner function parameters! In technical terms “lexical scoping dictates that free variables in a procedure are taken to refer to bindings made by enclosing procedure definitions.”

Just omit the parameters of the outer definition from the parameters that correspond to the same entity in the parameters of inner definitions and use the same name in the inner function body. Author’s sqrt with lexical scoping. See that good-enough? does not have x in it’s parameter list anymore!

(define (sqrt x)
  (define (good-enough? guess)
    (< (abs (- (square guess) x)) 0.001))
  (define (improve guess)
    (average guess (/ x guess)))
  (define (sqrt-iter guess)
    (if (good-enough? guess)
        guess
        (sqrt-iter (improve guess))))
  (sqrt-iter 1.0))

One constraint of embedded definitions is that they have to be the first statements in a procedure body. Authors warn that such usage might let students fall back to their imperative programming habits. As a problem student I immediately tried definitions in the middle of body as following and got expected results.

(define (f)
  (define x 1)
  (display x)
  (define y 2)
  (display y))
(f)
> 1 2

It could be that how to handle those cases is not part of the language specification, they arebe implementation specific. However, looks like there are these unspoken agreements that surround the binding documents such as APIs and language specifications which are due to cultural inclinations, expectations, familiarity, historical contingency. Both MIT-Scheme and Racket behaving the same way under unspecified conditions is an example.

This feels similar to how Windows does not want developers to use their internal API but once they are discovered and used by popular software such as Photoshop, Microsoft rarely changes those API in order to not to irritate and scare away their operating system users.

Ideally a developer never should rely on internal API, which can by definition change any moment without warning. However, it’s historically well-known that MS won’t change those APIs, and they become free. But in MS world developers are the king. (Apparently, a company can also buy Windows source code [See Microsoft | Shared Source Initiative] if they are rich and sign an NDA, and have access to all of these secrets of low-level magic [])

Inspired definitions in the middle of the body, I also tried code where I used the name before their definitions, and tried definitions inside a loop that change and they both worked as expected. But authors gave a disclaimer “the management is not responsible for the consequences of running programs that intertwine definition and use.”

(define (redef)
  (define x 2)
  (define x 3)
  (display x))
(redef)

Above re-definition threw ;duplicate internal definitions for (#[uninterned-symbol 12 size]) in redef in MIT Scheme and letrec-values: duplicate binding name in: x in Racket. However following “re-definition” in a loop, worked as imperative assignments, I’d like to learn more the details of what’s happening underneath.

(map 
    (lambda (x) (define a x) (display a))
    (list 1 2 3))
> 1 2 3

Above code definitely feels like assigning values of items in the given list to variable a.

Note that the substitution model (aka “evaluate the procedure body after substituting the arguments for the parameters”) discussed in the chapter before cannot by itself explain the computation with lexical scoping. It doesn’t say anything about the locality of internal definitions. It has no mechanism to prevent those name to be bound globally.

Nested scoping vs. unit testing

Is above method of putting function definitions inside function definitions a good thing to do? A big code base usually requires unit tests so that developer can change code without fear (or with a manageable amount of fear).

In a project with unit tests we need a way to expose these internal functions to test environments. I guess, in a language that has reflection capabilities, it should be possible to flag inner functions somehow (Python has decorators, Hack has attributes) to make them visible to testing system. Only thing the testing framework needs is the code and it can evaluate the code with provided test data. Pseudo-code:

def outer_func(x, z):
    @visible_in_tests
    def inner_func(x, y):
        # some logic on x and y
        result = ...
        return result
    return inner_func(x, z)

class MyTest(UnitTest):
    def test_inner_func():
        func = self.get_func('inner_func')  # which was made visible to test system
        assert 5 == func(1, 2)

At my noob level in Scheme I can’t think of a way to decorate an inner function but I hope that at some point through this class I’ll gain enough knowledge.

Now comes the harder question, when we go further from nested definitions to lexical scoping.

def outer_func(x, z):
    def inner_func(y):
        return f(x, y)
    return inner_func(z)

In this case we are not providing x to inner_func as an argument. Instead, inner_func is in the scope of x hence it’s value is known while evaluating inner_func. But, how can we provide this scope to inner_func in unit testing? Assign all variables that are not a function parameter a value and mock-up a scope? Pseudo-code:

self.get_func('inner_func', scope={'x': 1})

Imagine that this scope can grow arbitrarily complex, which makes the job of the unit tester harder. Feels like this is a good food for taught while finishing this chapter.

published at: 2020-09-08 09:02 UTC-5
tags: