Vital few, trivial many: December 2022

Friday, December 30, 2022

Tuesday, December 27, 2022

This one is so obvious that it probably doesn’t need stating, but I’ll go ahead and say it anyway. When we write code, we are trying to solve a problem, such as implementing a feature, fixing a bug, or performing a task. The primary aim of our code is that it should work: it should solve the problem that we intend it to solve. This also implies that the code is bug free, because the presence of bugs will likely prevent it from working properly and fully solving the problem.

When defining what code “working” means, we need to be sure to actually capture all the requirements. For example, if the problem we are solving is particularly sensitive to performance (such as latency, or CPU usage), then ensuring that our code is adequately performant comes under “code should work,” because it’s part of the requirements. The same applies to other important considerations such as user privacy and security.

Code should keep working

Code “working” can be a very transient thing; it might work today, but how do we make sure that it will still be working tomorrow, or in a year’s time? This might seem like an odd question: “if nothing changes, then why would it stop working?”, but the point is that stuff changes all the time:

Code likely depends on other code that will get modified, updated, and changed.
Any new functionality required may mean that modifications are required to the code.
The problem we’re trying to solve might evolve over time: consumer preferences, business needs, and technology considerations can all change.

Code that works today but breaks tomorrow when one of these things changes is not very useful. It’s often easy to create code that works, but a lot harder to create code that keeps working. Ensuring that code keeps working is one of the biggest considerations that software engineers face, and is something that needs to be considered at all stages of coding. Considering it as an afterthought, or just assuming that adding some tests later on will achieve this are often not effective approaches.

Code should be adaptable to changing requirements

It’s actually quite rare that a piece of code is written once and then never modified again. Continued development on a piece of software can span several months, usually several years, and sometimes even decades. Throughout this process requirements change:

business realities shift
consumer preferences change
assumptions get invalidated
new features are continually added

Deciding how much effort to put into making code adaptable can be a tricky balancing act. On the one hand, we pretty much know that the requirements for a piece of software will evolve over time (it’s extremely rare that they don’t). But on the other hand, we often have no certainty about exactly how they will evolve. It’s impossible to make perfectly accurate predictions about how a piece of code or software will change over time. But just because we don’t know exactly how something will evolve, it doesn’t mean that we should completely ignore the fact that it will evolve. To illustrate this, let’s consider two extreme scenarios:

Scenario A — We try to predict exactly how the requirements might evolve in the future and engineer our code to support all of these potential changes. We will likely spend days or weeks mapping out all the ways that we think the code and software might evolve. We’ll then have to carefully deliberate every minutia of the code we write to ensure that it supports all of these potential future requirements. This will slow us down enormously; a piece of software that might have taken 3 months to complete might now take a year or more to complete. And at the end of it, it will probably have been a waste of time because a competitor will have beaten us to the market by several months, and our predictions about the future will probably turn out to be wrong anyway.
Scenario B — We completely ignore the fact that the requirements might evolve. We write code to exactly meet the requirements as they are now and put no effort into making any of the code adaptable. Brittle assumptions get baked all over the place and solutions to subproblems are all bundled together into large inseparable chunks of code. We get the first version of the software launched within three months, but the feedback from the initial set of users makes it clear that we need to modify some of the features and add some new ones if we want the software to be successful. The changes to the requirements are not massive, but because we didn’t consider adaptability when writing the code, our only option is to throw everything away and start again. We then have to spend another three months rewriting the software, and if the requirements change again, we’ll have to spend another three months rewriting it again after that. By the time we’ve created a piece of software that actually meets the users’ needs, a competitor has once again beaten us to it.

Scenario A and scenario B represent two opposing extremes. The outcome in both scenarios is quite bad and neither is an effective way to create software. Instead, we need to find an approach somewhere in the middle of these two extremes. There’s no single answer for which point on the spectrum between scenario A and scenario B is optimal. It will depend on the kind of project we’re working on and on the culture of the organization we work for.

Code should not reinvent the wheel

When we write code to solve a problem, we generally take a big problem and break it down into many subproblems. For example, if we were writing some code to load an image file, turn into a grayscale image, and then save it again, the subproblems we need to solve are:

Load some bytes of data from a file
Parse the bytes of data into an image format
Transform the image to grayscale
Convert the image back into bytes
Save those bytes back to the file

Many of these problems have already been solved by others, for example loading some bytes from a file is likely something that the programming language has built in support for. We wouldn’t go and write our own code to do low-level communication with the file system.

Similarly, there is probably an existing library that we can pull in to parse the bytes into an image. If we do write our own code to do low-level communication with the file system or to parse some bytes into an image, then we are effectively reinventing the wheel. There are several reasons why it’s best to make use of an existing solution over reinventing it:

It saves a lot of time — If we made use of the built-in support for loading a file, it’d probably take only a few lines of code and a few minutes of our time. In contrast, writing our own code to do this would likely require reading numerous standard documents about file systems and writing many thousands of lines of code. It would probably take us many days if not weeks.
It decreases the chance of bugs — If there is existing code somewhere to solve a given problem, then it should already have been thoroughly tested. It’s also likely that it’s already being used in the wild, so the chance of the code containing bugs is lowered, because if there were any, they’ve likely been discovered and fixed already.
It utilizes existing expertise — The team maintaining the code that parses some bytes into an image are likely experts on image encoding. If a new version of JPEGencoding comes out, then they’ll likely know about it and update their code. By reusing their code, we benefit from their expertise and future updates.
It makes code easier to understand — If there is a standardized way of doing something then there’s a reasonable chance that another engineer will have already seen it before. Most engineers have probably had to read a file at some point, so they will instantly recognize the built-in way of doing that and understand how it functions. If we write our own custom logic for doing this, then other engineers will not be familiar with it and won’t instantly know how it functions.

The concept of not reinventing the wheel applies in both directions. If another engineer has already written code to solve a subproblem, then we should call their code rather than writing our own to solve it. But similarly, if we write code to solve a subproblem, then we should structure our code in a way that makes it easy for other engineers to reuse, so they don’t need to reinvent the wheel.

The same classes of subproblems often crop up again and again, so the benefits of sharing code between different engineers and teams are often realized very quickly.

Monday, December 26, 2022

Bad code, good code

Types of Code

Code base — the repository of code from which pieces of software can be built. This will typically be managed by a version control system such as git, subversion, perforce, etc.
Submitting code — sometimes called “committing code”, or “merging a pull request”. A programmer will typically make changes to the code in a local copy of the code base. Once they are happy with the change, they will submit it to the main code base. Note: in some setups, a designated maintainer has to pull the changes into the code base, rather than the author submitting them.
Code review — many organizations require code to be reviewed by another engineer before it can be submitted to the code base. This is a bit like having code proofread, a second pair of eyes will often spot issues that the author of the code missed.
Pre-submit checks — sometimes called “pre-merge hooks”, “pre-merge checks”, or “pre-commit checks”. These will block a change from being submitted to the code base if tests fail, or if the code does not compile.
A release — a piece of software is built from a snapshot of the code base. After various quality assurance checks, this is then released “into the wild”. You will often hear the phrase “cutting a release” to refer to the process of taking a certain revision of the code base and making a release from it.
Production — this is the proper term for “in the wild” when software is deployed to a server or a system (rather than shipped to customers). Once software is released and performing business-related tasks it is said to be “running in production”