**11/15/20**

The first time you encounter a theorem concluding with “an arrow that makes the diagram commute” can be quite confusing. But with a little thought you can typically find that the idea expressed the theorem is obvious. The provided arrow is simply the one thing that could possibly fit. It may be tedious to construct, but understanding the theorem clarifies why it must always be there, revealing patterns hidden in the construction.

Let’s see how this works by studying the universal property of quotients, which was the first example of a commutative diagram I encountered. If you are familiar with topology, this property applies to quotient maps. But we will focus on quotients induced by equivalence relation on sets and ignored additional structure.

An equivalence relation `~`

is a binary relation satisfying the following properties:

- reflexive:
`x ~ x`

- symmetric:
`x ~ y <=> y ~ z`

- transitive:
`x ~ y and y ~ z implies x ~ z`

.

Examples include equality of real numbers, whether numbers are both even or odd (parity), matrix similarity, isomorphism, etc.

**Exercise:** Prove matrix similarity is an equivalence relation.

Given an equivalence relation ~ and an element `x`

we can form it’s equivalence
class `[x]`

which is the set of things equivalent to it.
The set of a equivalence classes form a new set `X/~`

with an analogous
structure to the original, but with portions “grouped up” or “collapsed”

**Exercise:** Prove distinct equivalence classes are disjoint.

Let `q(x) = [x]`

be a map from an element to it’s equivalence class.
Given a map `f`

from `X`

to `Y`

which is constant on equivalence
classes (`q(x) = q(y) => f(x) = f(y)`

),
we obtain a unique map from `X/~`

to `Y`

making the diagram commute.

What is going on here? `f`

isn’t just any map,
it’s constant on equivalent elements of `X`

.
So a lot of the information in `X`

isn’t really needed to compute the image of `f`

.
Since equal elements get sent to the same place,
we could imagine picking just one element from each class and seeing where it goes.

In other words, we could define a function on each equivalence class.
This map would have the same image as `f`

and this is precisely what the universal property tells us
is possible

Let’s apply this theorem to a particular example and attempt to fill in the diagram. Suppose our domain is the solid disc.

To construct an equivalence relation on the disk, think of properties that make points in the disc similar to one another.
One such property is their distance from the center.
Define `x ~ y`

whenever `||x|| ~ ||y||`

.
What does the quotient space `X/~`

look like under this relation?
Each class is a ring at a particular radius `L`

, so denote it `[L]`

.
The radii of course vary continuously so we get set of classes isomorphic
to a closed interval `[0, 1]`

.

**Exercise:** Find an equivalence relation on D^{2} without 0 (punctured disc) whose classes
form the circle.

Now define a function `f(x) = x^2 + y^2`

so our codomain is the real numbers R.
The graph is a multivariable calculus style paraboloid living in `R^3`

above
the unit disc:

Can we apply the universal property?
Almost, we need to confirm `f`

is constant on equivalence classes.
Suppose `(x, y) ~ (a, b)`

.
Then `sqrt(x^2 + y^2) = sqrt(a^2 + b^2)`

.
Squaring both sides we get that `f(x, y) = f(a, b)`

.

Ah! We can clearly see `f`

only depends on radius.
No matter what angle you are at `f`

does the same thing.
So the disc is more information than we need,
we can define a similar function on the space of equivalence classes `X/~ = [0, 1]`

.
Can you figure out what it is?

…

The missing function is of course `h([L]) = L^2`

.
It’s graph is a parabola in 2D which carves out the same range as `f`

in the real numbers.

To check commutativity take a point `(x, y)`

and apply `f(x, y) = x^2 + y^2`

.
Now send it to it’s equivalence class `q(x, y) = [L]`

where `L = sqrt(x^2 + y^2)`

.
Now `h(L) = x^2 + y^2 = f(x, y)`

.

Easy right? You probably could see right away that `f`

was just the square of the distance.
In other examples constructing such a function `h`

might be less obvious,
but the universal property tells us it is always there.

Next time you an encounter a commutative diagram proof, try a few examples to figure out what basic idea it is telling you.

**Exercise:** Prove that `h`

is unique. No other function could make the diagram commute.

Commutative diagrams are the central focus of category theory which attempts to understand
such properties at a higher level of abstraction than set theory.
From a category theory perspective the quotient set `X/~`

is the co-equalizer
or co-limit of the diagram projecting and equivalent pair to it’s parts.

You can read more about category theory from Topoi by Robert Goldblatt.

Free groups provide another elementary example of universal properties. You can read about them in chapter 6.3 of Abstract Algebra by Dummit and Foote.

Quotient spaces are studied in depth in Topology by Munkres.

The later chapter on Algebraic Topology have More elaborate constructions. These gave me a lot of practice with commutative diagrams, such as the chapter on Seifert-Van Kampen’s theorem.

]]>**09/26/20**

C# programmers are typically familiar with `Select`

, `Where`

, and `Aggregate`

, the LINQ equivalents of the core functional programming
operations `map`

, `filter`

, and `reduce`

.
But the equally important `GroupBy`

isn’t as well understood or often used.
Although you may guess from the name, the problem `GroupBy`

solves is the following:
Given a list of items, we want to arrange the items so that equal elements are together.

For example, given a list of `Student`

objects,
we may want to group them by `student.teacher`

, and then obtain a list of `student.id`

corresponding to each teacher.

`GroupBy`

is a beautiful piece functional programming which allows work to be specified in terms of “what”
should happen as opposed to “how” to do it.
It combines harmoniously with the other LINQ operations
leading to cleaner code that can be combined in modular ways.

With the brief description above in mind, let’s examine the method signature, which is a little hairy and intimidating:

```
IEnumerable<TResult> GroupBy<TSource, TKey, TElement, TResult> (
IEnumerable<TSource> source,
Func<TSource,TKey> keySelector,
Func<TSource,TElement> elementSelector,
Func<TKey,IEnumerable<TElement>,TResult> resultSelector,
IEqualityComparer<TKey> comparer
);
```

Ignore the generic types and focus on the 4 functions that must be provided:

`TKey keySelector(TSource item)`

given an`item`

, return a key which will be used to decide which group to place this`item`

. Usually this simply returns a field on`item`

, but it can also generate a key, as we will see in a later example.`TElement elementSelector(TSource item)`

given an`item`

, return the value that will actually be stored in the group, Often this returns the item or a field on the item.`TResult resultSelector(TKey key, IEnumerable<TElement> contents)`

after elements are grouped together, this function will be called for each group. The`key`

is the identifier for this group and the`contents`

is an enumerable object containing the results. Most often you will just return`contents`

, but you may want to store the result in a new class.`int comparer(..)`

provide a function so that keys can be compared. This can be left off for key types like integer and string which are comparable.

With an overview of the functions, the meaning of the 4 generic types is now clear:

`TSource`

the items in the collection we will be grouping.`TKey`

from each item a key will be extracted to identity which group it belongs to.`TElement`

each item has the opportunity to be transformed before being put into a group. This is the type of that transformation.`TResult`

The type representing each group. Typically a collection, but could be custom class.

Once again, given a list of `Student`

objects,
we may want to group them by `student.teacher`

, and then obtain a list of `student.id`

corresponding to each teacher.

```
List<Student> students = ...;
students.GroupBy(
s => s.teacher, // keySelector
s => s.id, // elementSelector
(teacher, ids) => ids // resultSelector
);
// Result: [ [id1, id2, ..], [id1, id2] ]
```

Let’s imagine we are building a personal finance application. One report which might be useful is a histogram showing how many purchases fall within each price range, so $10.75 and $12.37 both fall within then $10-20 range. We want to find out the number of purchases, and the total value of these purchases. We will use a tuple to store these calculations for each resulting bucket.

```
struct Purchase {
double amount;
...
}
List<Purchase> purchases = ...;
purchases.GroupBy(
p => (int)Math.Floor(p.amount / 10.0), // keySelector
p => p.amount, // elementSelector
(bucket, amounts) => { // resultSelector
return Tuple.Create(
bucket,
amounts.Count(),
amounts.Sum()
);
}
)
// Result:
// [ (bucket, number of purchases, total purchase amount), ... ]
```

Assuming the range of buckets was known up front, one could simply allocate an array with the proper number of buckets and place each item directly. But, this implementation is very robust, especially for sparse data sets, and forgoes the need to translate our bucket values to indices.

That may be all you want about `GroupBy`

now.
Keep reading if you are curious about how it actually works or wonder
about it’s performance.
Grouping elements appears to be an inherently complex task, at first.
An initial idea might be to loop through each item, and then loop through all the other items to find matches.
This works, but is an `O(n^2)`

algorithm so we want to try a little harder.

The next natural implementation is to create a dictionary in which each key contains the results of a group. Iterate over each item in the list once, and put it in it’s proper group. This works just fine, but it seems a little kludgy. Dictionaries are rarely uniform. For each key you must check if a group is there and if not, create it, then at the end you have to iterate over a dictionary to put the final groups into an appropriate structure.

We can actually do bit better in elegance and often performance (not time complexity) as well.
The idea is to `sort`

elements by their key (`O(n log n)`

).
After sorting, all the equal elements that should be grouped together will be next to each other.
We can then apply a reduce-like operation to merge adjacent neighbors.

```
public static IEnumerable<TResult> GroupBy<TSource, TKey, TElement, TResult> (
IEnumerable<TSource> source,
Func<TSource,TKey> keySelector,
Func<TSource,TElement> elementSelector,
Func<TKey,IEnumerable<TElement>,TResult> resultSelector
) {
var sorted = source.Select(item => ValueTuple.Create(keySelector(item), elementSelector(item)))
.OrderBy(tup => tup.Item1);
var start = sorted.First();
var currentGroup = new List<TElement>()
{
start.Item2,
};
var groupKey = start.Item1;
foreach (var x in sorted.Skip(1)) {
if (x.Item1.Equals(groupKey)) {
currentGroup.Add(x.Item2);
} else {
yield return resultSelector(groupKey, currentGroup);
currentGroup = new List<TElement>()
{
x.Item2
};
groupKey = x.Item1;
}
};
yield return resultSelector(groupKey, currentGroup);
}
```

This pattern of “sorting and merging” is a powerful one. See Stepanov’s bigrams for an additional application. Note that this is almost certainly not how LINQ is implemented (we don’t give it a key comparer, just equality test), but is instructive.

LINQ is a fantastic example of the usefulness of domain specific languages (DSL). It’s a mini language designed for manipulating and transforming data in collections, taking advantage of the strong theoretical foundations of functional programming. But the benefits of pure functions come with serious constraints. With LINQ you can take advantage of them when they make sense and avoid them when they don’t. (Common Lisp actually does the reverse, the default style is somewhat functional and a “goto style” DSL is available for writing performance code.)

Although languages like Lisp are built around the idea of constructing DSLs,
it’s difficult to see how a language with roots in C could be made similarly malleable.
In fact, a great deal of runtime and compiler infrastructure
needed to be added to C# in order to support LINQ, including `lambdas`

anonymous objects, etc. Jon Skeet suspects the C# team had an idea to completely
generalize data access,
and then built up a chain of dependent features to reach this goal.
To learn more about this process and the ideas behind LINQ
I recommend Jon Skeet’s book C# in Depth.
I think you will enjoy it even if you are not interested in C# itself, as it is a joyful deep dive into practical language design and theory.

**10/07/20**

Lisp (both Common Lisp and Scheme) advocates are famous for having extravagant reasons for why Lisp is their favorite language. You might have heard claims that it’s the most powerful language, due to feature like macros or homioconicty. Certainly, Lisp has no shortage of beautiful and thought provoking ideas, but due to it’s influence, most of its benefits have now been included in modern languages. But, fancy language abstractions don’t appeal to me, so the whole meta-programming thing isn’t compelling. I am now interested in it for very simple and practical reasons;

- Common Lisp and Scheme are both fully standardized language with specifications. Consequently, they are well understood, cross-platform, and has multiple implementations, including several with free licenses.
- Lisp has great documentation, books, and learning resources. SICP is “the book” for Scheme. Common Lisp has several good ones.
- Lisp is mature and extremely stable. Code be written once, and run again years later, without modification.
- Lisp implementations are reasonably fast. The true believers claim Common Lisp is as fast as C. In general it’s not even close, but for a high-level, dynamic, language, it’s pretty fast.
- Lisp is well designed and follows solid computer science principles. It has a focused selection of features and an elegant evaluation model which make it easy to write and compose functionality.

You might think, this list of features isn’t remarkable or unique to Lisp, and you would be correct. In fact, this pattern was actually established by ANSI C first, and later adopted by Lisp. But, the surprising thing is how few languages since have followed it’s lead. Of course each item in this list is somewhat of a spectrum; some languages do more than others, but very few follow it to a degree that can be asserted with confidence. Usually, you can say a language somewhat satisfies it, followed by a list of ugly qualifications.

Take Python for example. It’s well designed, has great resources, is fairly stable. It has an elegant design inspired by Lisp. But, is it standardized? Kind of, they have a spec, but it has one defacto implementation. The CPython project does whatever they want, and this defines what the language is, forcing others to follow along. Alternative implementations exist, but they all have compatibility compromises. Python is also not very fast. You can use a JIT implementation which makes it tolerable, but that has its own quirks, and you can’t use many libraries with it.

So are Lisp and C the only languages that are based on performance, robust standards, and free software? No, but there aren’t as many as you think. Lisp just happens to be one of them, and it’s a design that I enjoy using and learning about.

]]>**09/26/20**

How do you ensure that programmers are using time effectively? What can you do to help them be productive? Managers think a lot about these questions. But too often they get the answers wrong despite being very earnest and smart. This is because they have an unexamined mental model of how programming work is actually done. With an inaccurate model, managers optimize the wrong things and make bad decisions which would otherwise be good decisions in other work environments.

To understand the idea, let’s look at a few examples of how productivity varies between jobs. Simple manual labor, like digging holes, is easy to understand. Productivity comes down to moving as much dirt per minutes as possible. Progress is very measurable, and it easy to think about how to improve it. As a manager, you might ensure there is a rotation of workers digging at all time, and that everyone has the best tools possible. However that looks very different than the working model for a professional baseball scout. A scout can take a few years of evaluating players just to find one great pro. There aren’t simple metrics you can use to measure their progress. It’s hard to tell exactly what productive work should look like. Exposure to a lot of players can help, but ultimately it comes down to a few big choices.

What kind of work is programming most like? In my experience (especially in agencies), the typical model of a programmer is an expensive factory machine. This is the kind of machine that occupies a lot of space and attention in the factory. It’s operation is fragile, so it needs to be constantly checked on and maintained. But most importantly, every minute that it is not working on jobs is expensive money wasted. The primary metric for ensuring it stays productive is utilization time, even if that means spending time on throw away jobs.

Managers understand this work model very well, perhaps they read “The Goal” in school. They look for metrics to identify throughput and bottlenecks and do lots of shuffling of programmers around to ensure maximum utilization. In it’s most exaggerated form this led managers to obsess over metrics “lines of code per week”. Although we have moved beyond that. The underlying thinking about what’s going on has changed much. We just think about programmers as feature producers. The task the programmer needs to finish programmer needs to sit at the computer and type in the project.

Here is a common example of this thinking: Bob is busy finishing task 1 and then will start project 2. Alice has a bit of downtime before she starts project 3. Let’s have Alice get started on project 2, then we can have her hand off the incomplete work to Bob. In a few weeks when he is ready, Alice can move on to project 3 and leave Bob with a head start.

This seems like smart move. We made sure that Alice and Bob’s valuable time was well used, right?

To see why this decision may actually waste time, we need
a better understand what programming is like.
The model I will describe is not perfect, it’s a great simplification, but
its a rough estimation that’s far more accurate than the factory machine one.
That better model is the following:
**each project consists of making 5-15 difficult software decisions**.
All of these decisions are of course somewhat dependent on one another.
Each of those decisions that is made well, will lead to shipping the project,
Each decision made wrong, will cause delays and waste time, now or down the road.
That’s all. That’s the description of the job programmers need to do,
and how they occupy their time.
If programmers knew what to write beforehand, they could type it in no time!
The time they spend fiddling at the keyboard is actually gather information
through experiment and observation, in order to make decisions.

As a manager, the way to improve productivity is to simply think about what would help you make hard decisions. Here are some simple ideas that become immediately obvious, given this clearer model:

Give programmers as much possible information about the project as early as possible. It’s difficult to make decisions with incomplete information. When you get ideas early they can sit in the back of your mind and get some sleep time. Organize information so its easy to process.

Protect programmer time from interruptions. Design large blocks of uninterrupted time. Have quiet working environments. It’s hard to make good decisions without focus. Other writers have written extensively about the cost of “context switching” when you have to forget about what you were thinking and start thinking about something new.

Ensure programmers have a clear understanding of project priorities and goals. Clearly communicate expectations and requirements. Making good decisions requires making trade-offs. Unclear priorities will lead to programmers making bad decisions. Changing priorities will invalidate previously good decisions.

It’s OK if a programmer is not at their keyboard writing code. Taking a walk or a rest is often productive time. Distracting activities like watching a lot YouTube are not.

Don’t treat programmers interchangeably. Give them ownership of specific project pieces so they can make long term decisions and build on previous ones. Would you hire a separate architect to design each room of a house?

Ensure your incentive structure rewards good decision making. Do you reward band-aid approaches? Are programmers just trying to get it “over the fence” so someone else has to deal with it? Are programmers are the receiving side of the fence?

Going through a crunch? Don’t add people to a late project. Throw more money and effort at building an focused environment to making decisions. Can you pay someone to errands at home so the programmer doesn’t have to think about them? Can you bring their family to the office for a lunch? (1)

This is just a short list of suggestions and you can probably can come up with even better ideas. What’s important is that as we think deeper about this model common productivity mistakes become apparent. (2)

Let’s analyze the Bob and Alice example again using this model. Did we help Alice and Bob make decisions or hurt them? Recall, Alice starts working on task 2 for a few weeks, to get it started for Bob. Then Bob picks it up, and Alice moves elsewhere. In our rough model we can assume that when Alice finishes, she has made 2 of the 15 decisions, and has some vague ideas about 5 others. The manager arranges a meeting for Alice to inform Bob of her work and she is swept off to task 3. Now Bob has to start thinking about task 2 from scratch, but not only that, he has to figure out what Alice was up to, whether the decisions she made were right, and what direction she was heading to address the other 13 decisions. This added information causes Bob to take more time than if he had started on his own. In practice, its even worse because Alice and Bob will need to continue to take time from each other to coordinate about what Alice had in mind. (3)

You might say, “that makes sense, but what am I supposed to do? Let Alice sit around for a week, wasting that time with an urgent project?”. Let’s use the model to work through better options. In this scenario Alice is scheduled to eventually own project 3. A better option would be to just let task 2 sit until Bob is ready, and get her stated on her project, even though it may be less urgent. Alice will be able to spend more time thinking about her project, and leave Bob with a fresh slate.

“But this is really urgent, I don’t even want to think about project 3 yet”. The next alternative would be to identify decisions which Alice can make which have the least impact on the other work Bob needs to do. One would be having Alice better define the requirements and organize resources for this project. This will only help Bob make his decisions. Another idea is to have Alice work on “black boxes” that Bob doesn’t need to know how they work. Maybe Alice has some special knowledge and can write a hard function he expects to use. What you absolutely don’t want to do is interrupt Bob and have him start thinking about what decisions Alice could be making, while he is making his decisions. That doesn’t make any sense!

I don’t think programmers

*deserve*any of these special treatments over other jobs. I am not suggesting that programmers should have special privileges or are somehow more important than other kinds of work. It’s all about the productivity model for the kind of work they do.In video game production, art tends to be easy to outsource while programming is notoriously difficult and often unsuccessful. Perhaps this is because 3D art production can be better fit into a factory production productivity model.

One reason “just write it from scratch” is so appealing for programmers is that they get to make their own decisions, instead of figuring out why others made (often bad) decisions. Who wants to live with someone elses choices? For large existing codebases this represents a lot of invested knowledge, so its usually bad, but for a new task, Why would you rob a programmer of this opportunity by playing musical chairs?

**1/23/2020**

The software industry is synonmous with rapid innovation. New discoveries require changing to better ways of doing things. For programmers to stay relevant they need to constantly keep up on the latest technologies, otherwise their skills will go out of date. At least, that’s what I’ve been told, but the more I study computer science from 50 years ago, the more I find “new” ideas are actually old. Perhaps the changes we see are only permutations of underlying ideas.

In many obvious ways software does change. Languages, libraries, and tools get replaced overtime. Nobody is writing desktop applications in assembly anymore. Many developers who wrote Windows applications moved on to mobile or web. But rarely are these changes due to a technological advancement or dramatic in change in how things are done, just new forms of popular products, or improved hardware that gives us some wiggle room.

A good engineer can adapt to these changes, in the same way they can switch to a company which uses different tools and processes. Learning Python after C# should be easy, because you understand programs, not the syntax. Moving to a language with significant design differences like Haskell requires a broader understanding of functions and computation. Now imagine there was similar knowledge that helped you approach every computer problem and perhaps all of nature. The knowledge I am describing is math and science! For software it consists of computer science subjects like OS theory, algorithms, software design, logic, and calculus. Because many programmers lack this, they struggle with change, and feel pressure to keep up with trends. (1)

Don’t believe me? Take a look at the hottest “new” areas of tech AI/machine learning, blockchain, and big data/data science. What barriers make it difficult for programmers to get into them and be successful? For machine learning the answer is a bit of linear algebra and multivariable calculus, evidenced by the many blog post promising to get readers up to speed. This math is (should be) covered by every computer science degree and has remained the same for at least 50 years, down to the presentations and illustrations used to teach it. Neural networks themselves have been around for a long time, although less mainstream.

For blockchain the essentials are an understanding of cryptography, and peer-to-peer networking. That doesn’t even mean fancy understandings of elliptic curves or number theory, just a solid grasp of hashes, signatures, and asymetric key encrpytion. Distributed systems is also a mature field of computer science.

Data science might be the most accessible. A solid understanding of introductory probablity and statistics and databases combined with a few tools such as linear regression, and polynomial interpolation might be enough for 90% of applications. But it’s going to be really hard, if you can’t understand a wikipedia page about polynomials.

I don’t mean to suggest that these skills make an expert. General theory is a longshot from research or novel contributions to the field. Nor is it sufficient to be a good programmer; learning linear algebra does not immediately make you good at writing machine learning programs (and scientists write some terrible code!). Rather this is the “hard stuff” that prevents programmers from getting into these fields. Once you know it, the other details are approachable. (2)

If you analyze other software advances from the past, from database theory to graphics, you will find similar applications of rather unextraordinary math and science. Many programmers ask, how I can I predict what skills will be important in the future? What do I need to learn to have a successful career? Few can predict what specific trends will take off, but I bet whatever is important in the future is going to require understanding those broad areas of computer science. Keep practicing and specializing in the area you work in, but if you regularly refresh and broaden your base of fundamentals you will be prepared to learn anything new that starts to look interesting. (3)

Understanding fundamentals gives programmers another significant advantage. They know what problems have been solved before. It’s unlikely you will remember every detail, but can say “I’ve heard of this before” and know where to learn more. Consider how many new tools are bad solutions to problems solved by simple bash scripts, of which the author was ignorant of. This is just a tiny represenative of how much duplication and complexity programmers are adding because they don’t know whats already there.

It may sound as though I am advocating a kind of tech hipsterism; everything interesting has already been done so we might as well stop looking for new things. Rather I am arguing that we will be able make more advancements, if we better understand the big ideas behind what has come before.

Properly understood this flips the progress narrative of technology on it’s head. We don’t have to chase headlines and blog posts about the latest frameworks and build tools. Nor do we have to guess which technologies might suddenly become useful, like day trading stocks. That’s for evangelists and IT consultants. The foundations for our next ideas has have already been built by generations of smart people. It’s all written down waiting to be dusted off and rediscovered. Rather than building and encyclopediac knowledge of novelties and press releases, we continuosuly revisit the same core subjects, over and over, in pursuit of mastery.

It doesn’t have to come from a university, and a majority of those who study STEM seem to lose it after going through the motions.

A great example of this is the popular book SICP. In just a few hundred pages it covers several major areas of CS and more than most programmers understand in their whole career. It’s able to do that because it’s not written for a general audience. It relies on the reader having a solid undergraduate understanding of another area of math or science as an MIT student would have.

Experts in a very particular system (for example OpenCL or V8 internals) can be extremely valuable. But, most of them only get that level of depth with a solid understanding of fundamental computer science. Expertise is also a subject for another day.

**6/8/19**

Programmers love to discuss programming languages. We not only debate their technical merits and aesthetic qualities, but they become integrated into our personal identities, along with the values and traits that we associate with them. Some even defend a form of Linguistic Determinism that thinking is confined to what the language makes typable.

Since we spend so much time writing code, a keen interest in language design is justified.
However, the character of these discussions suggests that we think of them as much more,
and have perhaps forgotten their primary role.
Programming languages are *implementation tools* for instructing machines, not *thinking tools*
for expressing ideas.
They are strict formal systems riddled with design compromises and practical limitations.
At the end of the day, we hope they make controlling computers bearable for humans.
In contrast, thoughts are best expressed through a medium which is free and flexible.

The natural language which has been effectively used for thinking about computation, for thousands of years, is mathematics. Most people don’t think of math as free or flexible. They think of scary symbols and memorizing steps to regurgitate on tests. Others hear math and think category theory, lambda calculus, or other methods of formalizing computation itself, but these are hardly necessary for programming itself.

I hope readers of this article have had a better experience regarding what math is about, such as a graph theory, algorithms, or linear algebra course; the kind that involves logic and theorems, and is written in prose with a mix of symbols (most symbols weren’t even invented until the 16th century). This kind of math is about creating logical models to understand real world problems, through careful definitions and deductions. If you don’t have a clear idea of what this looks like I recommend Trudeau, Stepanov, or Manber.

Math allows you to reason about logical structures, free from other constraints. This is also what programming requires: creating logical systems to solve problems. Take a look at the basic pattern for programming:

- Identify a problem
- Design algorithms and data structures to solve it
- Implement and test them

In practice, work is not so well organized as there is interplay between steps. You may write code to inform the design. Even so, the basic pattern is followed over and over.

Notice that steps 1 and 2 are the ones that take most of our time, ability, and effort.
At the same time, these steps don’t lend themselves to programming languages.
That doesn’t stop programmers from attempting to solve them in their editor, but they end up with code that is muddled, slow, or that solves the wrong problem.
It’s not that programming languages aren’t good enough yet.
It’s that *no formal language* could be good at it.
Our brains just don’t think that way.
When problems get hard, we draw diagrams and discuss them with collaborators.

Ideally, steps 1 and 2 are solved first, and only then will a programming language be used to solve step 3. This has an added benefit of transforming the implementation process. With a mathematical solution in hand, you can then focus on choosing the best representation and implementation, and writing better code, knowing what the end goal will be.

Why are programming languages burdensome thinking tools? One reason is that writing code is inseparably connected with implementation concerns. A computer is a device that must manage all kinds of tasks and while being bound by physical and economic constraints. Think about all the considerations for writing a simple function:

- What inputs should I provide?
- What should they be named?
- What types should they be? (Even dynamically typed languages must consider types, it’s just implicit.)
- Should I pass them by value or by reference?
- What file should I put the function in?
- Should the result be reused, or is it fast enough to recalculate it every time?

The list can go on. The point is that these considerations have nothing to do with what the function does. They distract from the problem the function is trying to solve.

Many languages aim to hide details such as these, which is helpful, especially for mundane tasks.
However, they cannot transcend their role as an implementation tool.
SQL is one of the most successful examples of this, but it is ultimately concerned with implementation concerns such as tables, rows, indices, and types.
Because of this, programmers still design complicated queries in informal terms, like what they want to “get,” before writing a bunch of `JOIN`

s.

Another limitation of programming languages is that they are poor abstraction tools. Typically, when we discuss abstraction in engineering, we mean hiding implementation details. A complex operation or process is packaged into a “black box” with its contents hidden and well-defined inputs and outputs exposed. Accompanying the box is fictional story that explains what it does, in a greatly simplified way.

Black boxes are essential for engineering large systems since the details are too overwhelming to hold in your head. They also have many well-known limitations. A black box leaks because its brief description cannot completely determine its behavior. The opaque interfaces introduce inefficiencies, like duplication and fragmented design.

Most importantly for problem-solving, black boxes are rigid. They must explicitly reveal some dials and knobs, and hide others, committing to a particular view about what it is essential to expose to the user, and what is noise. In doing so, they present a fixed level of abstraction which may be too high-level or too low-level for the problem, As an example, a high-level web server may provide a terrific interface for serving JSON, but be useless if one wants an interface for serving incomplete data streams, such as output from a program. In theory, you can always look inside the box, but in code, the abstraction level at any one time is fixed.

In contrast, the word abstraction in math is nothing like hiding information. Here, abstraction means extracting the essential features or characteristics of something, in relation to a particular context. Unlike black boxes, no information is hidden. They don’t leak in the same way. You are encouraged to adjust to the right level of abstraction and quickly jump between perspectives. You might ask:

- Is this problem best represented as a table? Or, a function?
- Can I look at the whole system as a function?
- Can I treat this collection of things as a single unit?
- Should I look at the whole system or a single part?
- What assumptions should I make? Should I make them stronger or weaker?

Just look at the many ways of looking at a function:

Thinking in math allows one to use whichever brings the most clarity at any moment.

It turns out most abstract concepts can be understood from many perspectives, just like functions. Studying math provides one with a versatile toolbox of perspectives for studying all kinds of problems. You might first describe a problem with a formula, and then switch to understanding it geometrically, then recognize some group theory (abstract algebra) is at play, and all of this combines to give insight and understanding.

To summarize, programming languages are great engineering tools for assembling black boxes; they provide functions, classes, and modules, all of which help wrap up code into nice interfaces. However, when trying to solve problems and design solutions, what you actually want is the math kind of abstraction. If you try to think at the keyboard, the black boxes available to you will warp your view.

Just as programming languages are rigid in their ability to abstract, they also are rigid in how they represent data.
The very act of implementing an algorithm or data structure is picking *just one* of the many possible ways to represent
something; along with all the trade-offs that come with it.
It is always easier to make trade-offs when one has use cases in mind and understand the problem well.

For example, graphs (sets of vertices and edges) appear in many programming problems such as internet networks, pathfinding, and social networks. Despite their simple definition, choosing how to represent them is hard and varies greatly depending on use case:

The one which most closely matches the definition:

`vertices: vector<NodeData> edges: vector<pair<Int, Int>>`

(The vertices can be removed if you only care about connectivity.)If you want to traverse a node’s neighbors quickly, then you probably want a node structure:

`Node { id: Int, neighbors: vector<Node*> }`

You could use an adjacency matrix. Where each row stores the neighbors of a particular node:

`connectivity: vector<vector<int>>`

and the nodes themselves are implicit.Pathfinding algorithms often work on graphs implicitly from a board of cells:

`walls: vector<vector<bool>>`

.In a peer-to-peer network, each computer is a vertex and each socket is an edge. The entire graph isn’t even accessible from one machine!

Math allows you to reason about the graph itself, solve the problem, and then choose an appropriate representation. If you think in a programming language, you cannot delay this decision as your first line of code commits to a particular representation.

Note that the graph representations are too diverse to wrapped up in a polymorphic interface.
(Consider again the graph representing a computer network, like the entire internet.)
So creating a completely reusable library is impractical.
It can only work on a few types, or force all graphs into an inappropriate representation.
That doesn’t mean libraries or interfaces aren’t useful.
Similar representations are needed again and again (like `std::vector`

),
but you cannot write a library which encapsulates the concept of “graph” once and for all.
A simple generic or interface with a few types in mind is appropriate.

As a corollary, programming languages should focus primarily on being useful implementation tools, rather than theoretical tools. A good example of modern language feature which does this is async/await. It’s not hiding away complex details or introducing new conceptual theory. It takes a common practical problem and makes it easier to write.

Thinking in math also makes the “C style” of programming more appealing. When you understand a problem well, you don’t have to build up layers of framework and abstraction in anticipation of “what if”. You can write a program tailor made to the problem, with carefully chosen trade-offs.

So what does thinking in math look like? For this section you may have to read a bit more slowly and carefully. Recently, I worked on an API at work for pricing cryptocurrency for merchants. It takes into account recent price changes and recommends that merchants charge a higher price during volatile times.

Although we did some homework on the theory, we wanted to empirically test it to see how it performed during various market conditions. To do so, I designed a bot to simulate a merchant doing business with our API, to see how it performs.

**BTC/USD (1 day)**

**Definition:** The **exchange rate** `r(t)`

is the market rate of `fiat/crypto`

.

**Definition:** The **merchant rate** `r'(t)`

is the modified exchange rate which the merchant is advised to charge customers.

**Definition:** When a customer buys an item, we call that event a **purchase**.
A purchase consists of the price in fiat and a time. `p = (f, t)`

.

**Theorem:** The amount of crypto for a purchase is found by applying the modified exchange rate
`t(p) = p(1) / r'(p(2))`

.

**Proof:** `p(1) / r'(p(2)) = fiat / (fiat/crypto) = fiat * crypto/fiat = crypto`

**Definition:** When the merchant sells their crypto holdings, we call that event a **sale**.
A sale consists of an amount in crypto and a timestamp. `s = (c, t)`

.

**Theorem:** The amount of fiat the merchant obtained from a sale is found by applying the exchange rate to the sale `g(s) = s(1) * r(s(2))`

.

**Proof:** `s(1) * r(s(2)) = crypto * (fiat/crypto) = fiat`

**Definition:** The **balance** of a set of purchases and sales is the difference between all purchase crypto amounts and all sale crypto amounts.
`b(P, S) = sum from i to N of t(p_i) - sum from j to M of s_j(1)`

Note that `b(P, S) >= 0`

must always hold.

**Definition:** The **earnings** of a set of purchases and sales is the difference between sale fiat amounts and purchase fiat amounts.
`e(P, S) = sum from j to M of g(s_j(1)) - sum from i to N of p_i(1) >= 0`

.

**Definition:** We say that the merchant rate is **favorable** iff the earnings are non-negative for *most* sets of *typical* purchases and sales.
`r'(t) is favorable iff e(P, S) >= 0`

.

In a favorable case, the merchant didn’t lose any fiat by accepting crypto.

*most* and *typical* will not be rigorously defined.

As part of *typical*, we can assume that merchants will sell their crypto in a timely manner.
So assume `s_i(2) - s_j(2) < W`

for `i,j in {1.. M}`

for some bound `W`

.
Purchase amounts should be randomly distributed within a reasonable range that commerce is done. Perhaps $10-100.

**The goal of the bot is to verify that r'(t) is favorable.**

Note that this definition is only one measure of quality. Perhaps protecting against the worst case is more important than being favorable. In that case, we would be concerned about the ability to construct a set of purchases with very negative earnings.

Repeat many times:

- Randomly choose a time range
`[t0, t1]`

. Generate a set of

**purchases**at random times within`[t0, t1]`

. The price should fall within a range`[p0, p1`

] of*typical*prices.Generate a set of

**sales**at evenly spaced times (perhaps with slight random noise) within`[t0, t1]`

. Each sale should be for the full**balance**at that time.- Calculate the
**earnings**for these sets. - Record the earnings.

After:

- Report how many earnings were negative and non-negative. Show a percentage for each.
- Identify the minimum and maximum earnings and report them.

As you read this example, I think your tendency may be to think that its statements are obvious.
Certainly, none of these steps are hard.
However, it was surprising to me how many of my assumptions were corrected and how difficult it was to choose an objective definition of a **favorable** outcome.
This process helped me become aware of assumptions I would not have even considered if I had started by simply writing code.
Perhaps the greatest benefit was that after writing it, I was able to quickly review it with a co-worker and make corrections which were easy on paper, but would have been difficult to change in code.

I hope that thinking in the language of math will bring similar benefits to your projects! Note that this example is only one style of utilizing mathematical thinking.

]]>**05/12/2019**

A good vector math library is essential for graphics and simulation programming. However, implementing one that is flexible, efficient, and easy to use is difficult. Due to so many choices, experienced programmers tend to write their own to accommodate their preference.

In this article I will survey a few of the most popular techniques and offer some design advice. I will specifically focus on math theory and C implementations.

Before diving into the code. It’s helpful to review some of the math to understand what we are aiming for. One thing to watch for is operations that can be defined in terms of each other. Rarely do I see libraries take advantage of this.

**Vector Operations**

On vectors in `R^N`

- addition
`v + w`

- subtraction
`v - w`

. Defined by addition:`a - b = a + (-b)`

- multiplication
`v * w`

- scaling
`a * v`

- normalization. Defined by length and scaling:
`1/|v| * v`

From `R^N -> R`

- dot product
`<v, w>`

- length
`|v|`

. Defined by dot product`sqrt(<v,v>)`

- angle. Defined by dot product and length
`acos(<a,b>/|a||b|)`

Only on specific dimension, such as `R^2`

or `R^3`

- cross product
`a X b`

- angle (in the plane)

**Matrix Operations**

On all matrices `M(n x m)`

- addition
`A + B`

- subtraction
`A - B`

Defined by addition`A - B = A + (-B)`

- scaling
`bA`

- multiplication
`AB`

On square matrices `M(n x n)`

- determinant
`det(A)`

- inverse
`A^-1`

Between vectors and matrices

- multiplication
`Av`

Most programs use 2, 3, and 4 element vectors, and only a few operations are specific to a given dimension. So a lot of code can be condensed by writing algorithms on N dimensional vectors.

Matrix operations are also very general. But a few should be kept to a specific dimension (usually 3x3 or 4x4). You do not want to implement a general inverse or determinant function.

```
typedef struct
{
float x, y, z;
} vec3;
vec3 vec3_add(vec3 a, b)
{
vec3 r;
r.x = a.x + b.x;
r.y = a.y + b.y;
r.z = a.z + b.z;
return r;
}
```

This works well for smaller programs.
The best part is that expressions look nice (`a + 2.0*(b-d)`

):

```
vec3_add(a, vec3_scale(2.0, vec_sub(b, c)));
```

But, we have to copy this definition for every dimension. We also have to avoid any algorithms that use index or iteration. Matrix vector multiplication gets ugly.

If you only have a few functions that need indexing and you can index into a pointer to the first member:

```
vec3 a;
float* v = &a.x;
v[0];
```

**Examples**

For `N`

dimensional vectors
we might try to write functions which
operate on arrays of floats.
This is nice because it does not
introduce another data structure, so
other functions and vector libraries play nice with each other.

Unfortunately, C does not allow you to return
an array from the stack. You can only return a pointer
which must point to some valid region.
So either we do something horrible like `malloc`

in each operation,
or pass in arrays for the return value.
Passing in arrays works, but it destroys the ability
to comfortably write simple expressions such as `a + 2.0*(b-d)`

:

```
void vecn_add(int n, float* a, float* b, float* ret);
// intermediate results everywhere
float temp[3];
vecn_sub(3, b, d, temp);
float temp2[3];
vecn_scale(3, 2.0, temp, temp2);
float final[3];
vecn_add(3, a, temp2, final);
```

Plain arrays may be appropriate for matrices since they are not typically involved in complex expressions. Matrices and large vectors, which would be inefficient to copy around would also be a good use case.

Depending on the application you may not want to sacrifice performance by introducing loops and branching into every operation. As long as the dimensions are input as a literals or macros, the small loops should be unrolled at compile time.

**Examples:**

A workaround to return an array from a function is to include it in a struct. The tradeoff is that the size must be fixed and element access is a bit uglier as it requires at least an extra letter.

```
typedef struct
{
float e[3];
} vec3;
vec3 v;
v.e[0] = 1;
```

The access syntax can be cleaned up with a union but, anonymous structs/unions are a GCC extension and are non-standard.

```
typedef union
{
float v[3];
struct
{
float x;
float y;
float z;
};
} vec3;
```

This gives you safe iterative access and nice named members, but it is hard to combine with generic functions. Either you use functions which operate on the internal arrays and deal with the intermediate results. Or, define fixed dimension functions which wrap the generic ones:

```
vec3 vec3_add(vec3 a, vec3 b);
{
vec3 temp;
vecn_add(3, a.v, b.v, temp.v);
return temp;
}
```

I don’t love this option. If I need to write wrapper functions I might as well go back to method 1 and copy implementations around.

**Examples:**

- Math 3D (See Matrices)

Some clever macros can help you get the best of both worlds, and parameterize the scalar types. This can be combined with an array or union data structure. But, writing multi-line macros isn’t very fun.

```
#define DEFINE_VEC(T, N, SUF) \
\
void vec##N####SUF##_add(const T *a, const T *b, T *r) \
{ \
for (int i = 0; i < N; ++i) \
r[i] = a[i] + b[i]; \
} \
```

Then define the types you need:

```
DEFINE_VEC(float, 2, f);
DEFINE_VEC(float, 3, f);
DEFINE_VEC(float, 4, f);
```

Usage:

```
vec3f_add(a, b);
```

Functions which only apply to a specific dimension can be defined outside of the macro:

```
void vec3f_cross(const float* a, const float* b, float* r)
{
// ...
}
```

**Examples:**

In typical C fashion, I believe it is misguided to try to write
the *one true* vector library to serve all purposes.
These libraries are bloated and must choose tradeoffs which
don’t fit your use case. Instead use the examples
above to write to tailor make vector functions as needed.

For further reading, see On Vector Math Libraries. It focuses on C++ and has a few other handy tips. You can also read a discussion which led to these notes.

]]>**04/13/2019**

One of my favorite courses in college was philosophy of language. Along with interesting philosphy it introduced me to the foundations of math project which has since become one of my favorite subjects to learn about.

Some of my favorite books of all time come out of this interest. I wanted to organize a few of these into a list for others who are interested in the topic. Note that many good books were left out in favor of the very best.

You will notice a theme in my commentary. The books I like most are those that don’t shy away from hard technical knowledge, but also explore the philosophical ideas behind them.

The list is arranged in a progressive sequence that will help prepare you for the next one.

Logicomix is actually a comic book! It tells an engaging historical narrative about the search for the foundations of math and the birth of analytic philosophy, in the early 20th century.

It introduces you to all the major characters such as Gödel, Russel, Frege, and Wittgenstein and motivates the kinds of problems they were trying to solve. The book also explores how these ideas connect to modern computer science.

It is an absolute joy to read and will give you a taste of whether this is an interesting subject for you.

You probably have seen this work recommended elsewhere. Gödel, Escher, Bach really deserves all the praise that it gets.

Hofstadter covers an enormous range of topics including formal systems, Godel’s proof, theory of computation, programming, molecular biology, and artificial intelligence. Every topic is presented beautifully and with a lot of philosophical discussion. In many ways it is an introduction to the big ideas in modern science. It is written for a general audience and assumes no mathematical background.

By: James Newman & Ernest Nagel

Gödel, Escher, Bach does a good job of introducing the incompleteness theorem and discussing its ramifications, but if you are like me you probably still won’t completely understand it after a first reading.

This concise book offers another perspective and a clear explanation of the mathematics of the proof, its general strategy, and the historical context surrounding the incompleteness problem.

By: Phillip Davis & Reuben Hersch

In the 17th century Descartes had a dream in which he saw a future world driven by mathematical calculations and logical systems. The theme of this book is how this dream has become a reality.

The book explores how mathematics and computer science work together, surveys several interesting fields, and examines ethical issues in the technological world. This book is not technical, and should be appropriate for anyone interested in science or technology.

If you like this book, the authors wrote another called *The Mathematical Experience*
which is focused more on pure mathematics.

By: Marvin Minsky

Marvin Minsky is an incredibly clear and deep writer. In this work he provides a mathematical framework for thinking about mechanical machines and develops the theory of computation.

This book teaches you all you need to know about Turing machines, finite state, and neural networks. My project: McCulloch & Pitts Neural Net Simulator is based on this book.

To read this book, you should understand some logic and basic set theory such as that taught in an introductory proofs course.

Unfortunately, it is out of print and may be difficult to obtain (for a reasonable
amount of money).
I read it from my university’s library. If that is not an option, I recommend you *find it online*.
If anyone knows of a place where I can reasonably purchase this book, let me know.

By: Gerald Sussman & Hal Abelson

This classic text is designed to teach programming to MIT students who have some technical background in another areas of math and science. It is a hard read, but it assumes no programming knowledge and teaches Scheme (a Lisp dialect) and its full inter-workings from the ground up.

If you want to be a professional programmer, this may be the only book you need to study. What other book teaches you to write a symbolic differentiator, interpreter, circuit simulator, and compiler?

Most of the material is mixed in the excercises so don’t skip them!

But, this is not just a programming book. It belongs in this list because it teaches the fundamental concepts of computation. See the section data as programs for an example.

By: Pavel Pudlak

This book is a massive and dense survey of topics including formal systems, set theory, abstract algebra, computability theory, analysis of algorithms, and quantum computing. The first chapter covered almost everything I had learned about algebra and meta-mathematics in my entire undergraduate degree!

Pudlak does a fantastic job of balancing technical information with philosophical discussion. I can’t recommend this book enough.

Reading this book definitely requires some mathematical maturity. The author does his best to explains every concept in the book but it would be hard for me to read about a “group” for the first time and really understand what he means.

If you read through the other books, and have technical knowledge you should be well prepared.

]]>**02/22/2019**

Updated: **04/27/2020**

A friend and I had a discussion about the basic skills that are often lacking in experienced programmers. How can a programmer work for ten or twenty years and never learn to write good code? Too often they need close supervision to ensure they go down the right path, and can never be trusted to take technical leadership on larger tasks. It seems they are just good enough to get by in their job, but they never become *effective*.

We thought about our experiences and came up with three fundamental skills that we find are most often missing. Note that these are not skills which take a considerable amount of talent or unique insight. Nor are they “trends” or “frameworks” to help you get a new job. They are basic fundamentals which are prerequisites to being a successful programmer.

Programmers cannot write good code unless they understand what they are typing. At the most basic level, this means they need to understand the rules of their programming language well. It is obvious when a programmer doesn’t because they solve problems in indirect ways and litter the code with unnecessary statements that they are clueless as to what they actually do. Their mental model of the program does not match with the actual behavior of the code.

You may have seen code which misunderstands how expressions work: (1)

```
if isDelivered and isNotified:
isDone = True
else:
isDone = false;
```

Instead of:

```
isDone = isDelivered and isNotified
```

In JavaScript, this is often indicated by `new Promise`

inside a `.then()`

. In C++, it is attaching `virtual`

to every method and destructor and creating every object with `new`

.

Debugging is also extremely difficult if you don’t understand the language. You may add a line of code because it fixes a bug for reasons you don’t understand. Bugs are mysteries that seem to appear organically, like dust on the shelves. The code has a mind of its own.

Understand the code you write. Know what every line does and why you put it there.

Once you understand the language well, its important to know about implementation; what goes on *inside* the computer or library? Do you know how the code gets to assembly? Do you know how a closure captures variables? Do you know how the garbage collector works? Do you know how a map/dictionary works? Do you know how an HTTP request is made?

Modern software and computers are too complex to know everything (2).
The idea is not not to be too clever, but to avoid doing silly things.
Silly mistakes result in sluggish code that is wasteful of system resources, or just does unexpected things.
Removing an item from the front of a C++ `vector`

requires copying the entire vector which is worth thinking about in large cases.
Writing `map<string, map<string, ...>>`

in C++ (a dictionary of dictionaries) creates a a self-balancing tree in which every node is a self-balancing tree, a data structure nobody would intentionally design. (3)

A muddy understanding of how things work is typical of beginners, but it is all too often a problem with experienced programmers if they are not curious and do not take time to learn how things work beyond their immediate job’s needs. Learn just a bit more about the stuff you use most works.

To write reliable code, you must be able to anticipate problems, not just patch individual use cases. I am shocked by the number of times I see code that puts the program in a broken state when a very likely error happens.

I recently reviewed some code that made an HTTP request to notify a server of a state change in which the programmer assumed the HTTP request would always succeed. If it failed, (and we know how often HTTP requests fail), a database record was put into an invalid state. The questions they should have asked when writing this code are: What happens if this fails? Is there another opportunity to send the notification? When is the correct time to record the state change? Careful programmers think through the possible states and transitions of their program.

Using `sleep()`

, cron jobs, or `setTimeout`

is almost always wrong because it typically means you are waiting for a task to finish and don’t know how long it will take. What if it takes longer than you expect? What if the scheduler gives resources to another program? Will that break your program? It may take a little bit of effort to rig a proper event, but it is always worth it.

Another common mistake I see is generating a random file or identifier and hoping that collisions will never happen (4). It is reasonable for an unlikely event to cause an error, but it is not ok if that puts your program in an unusable state. For example, if a successful login generates a session token and it collides with another token, you could reject the login and have the user try again. It is a freak accident that slightly inconveniences the user. On the other hand, what if you generate storage files with random names and you have a collision? You just lost someone’s data! “This probably won’t happen” is not a strategy for writing reliable code.

Unit testing can’t solve this problem either. It can help you stop and think about some inputs to write a test, but more than likely, the cases that you write tests are the ones you anticipated when you wrote the code! Unit testing cannot transform fragile code into reliable code.

Fragile code is often caused by a lack of experience regarding things that can go wrong, but it can also be the result of a long career of maintaining existing codebases. When working on a large existing system, you typically fix individual bugs and aren’t rewarded by your bosses for improving the system as a whole. You learn that programming is a never-ending patch. Increasing the `sleep()`

time may fix the bug today, but never solve the underlying issue.

Even when armed with the other two skills, it’s hard to be effective unless you can organize code into a system that makes sense. I believe OOP and relational database get a lot of flack because programmers tend to be bad at design, not because they are broken paradigms. You simply can’t create rigid classes, schemas, and hierarchies without thinking them through. Design itself is too broad a topic to explore in this article (read Fred Brooks), so I want to focus on a few specific attributes that well-designed software tends to have.

You may have heard a rule like “don’t make functions or classes too long”. However, the real problem is writing code that mixes unrelated ideas. Poorly designed software lacks conceptual integrity. Its concepts and division of responsibilities are not well defined. It usually looks like a giant Rube Goldberg machine that haphazardly sets state and triggers events.

Accordingly, good software is built from well-defined concepts with clear responsibilities. Mathematicians and philosophers spend a lot of time discussing definitions because a good definition allows them to capture and understand some truth about the world. Programmers should think similarly and spend a comparable amount of effort grappling with ideas before writing code.

Good programmers ask questions like:

- “What is this function’s purpose?”
- “What does this data structure represent?”
- “Does this function actually represent two separate tasks?”
- “What is the responsibility of this portion of code? What shouldn’t it ‘know about’?”
- “What is necessary to be in the public interface?”

Luckily the field is ripe with strategies to help you design code. Design patterns and SOLID can give you guidelines for designing classes. Functional programming encourages writing pure functions (input -> output and no side effects) and maintaining as little state as possible. Model view controller aims to separate UI and storage concerns from program logic. On the other hand, React components form conceptual units by combining the HTML, CSS, and JS into a single component. Unix rejects categories and says everything is a file. All of these seemingly contradictory ideas are valid. The important thing is that the concepts make sense and map closely to the problem you are solving.

Software that is well-designed is also software that is easy to change. Of course, it’s too much to ask it to satisfy requirements that contradict its original intent. But, it should accommodate changes that are natural evolutions. A common mistake I see is solving a problem for a few cases, instead of N cases. (If you have a variable called `button3`

.) Another is treating everything as a special case using `switch`

statements instead of using polymorphism. (5)

I think the best way to learn about design is to write and study a lot of programs. Programmers who work only on old programs never learn to write new ones. The studying part is key too. Programmers who only work on small temporary projects (like an agency) may get by without ever improving how to design programs. Good design comes gradually with experience, but only if you think about it and try to improve.

There are no tricks or rules that you can follow to guarantee you will write good software. As Alex Stepanov said, “think, and then the code will be good.”

There may be some cases where the previous style is preferred. This example is only an illustration.

I was shocked when I wrote some multithreaded code and first faced bugs due to Cache coherence and instruction reordering!

A reader corrected me about the performance of modern

`map`

implementations. Map of map is just as good as alternatives, like map of pair. I think it is a good illustration of code being suprising, but perhaps I am doing a bit too much early optimization :) See his benchmark.There are ways to do this sort of thing correctly using cryptographic hashes.

Once again, this is just an example. Switching on type may be perfectly appropriate.