Software Maintenance Matters

The shiny new piece of software has been designed, written, tested and deployed. The users are (hopefully!) happy and any last bugs have been mopped up. The core development team has moved onto other projects and the ship is sailing smoothly in the direction we want and the users have no pressing needs for new features. Seems glorious!

Unfortunately that isn’t the end of the story - we still need to maintain the software, an ongoing exercise that has a significant financial cost.. So what do we actually mean by maintenance, and why is it so crucial that we do it?

For the purposes of this blog, we are going to assume that we want to keep this piece of software in use for a few years -- it isn’t just a throwaway app that we want for a short period (perhaps for marketing a product launch or migrating data from an old system to a new one).

What actually is modern software?

Well, usually our systems are some layers of code that we have written to sit on top of (or otherwise use) many other layers of code written by other people, often in other organisations. It is essentially never the case that our system is just made from software that we wrote. As each year goes by, the kinds of problems that we are trying to solve have stretched boundaries, and the necessity of components developed by other parties becomes ever stronger. AI is a fantastic example of something where we use libraries and models created by others (usually with very deep pockets) rather than trying to “roll our own”. The complexity of rolling our own would be too high.

If we take a technical look at one of our recent applications that manages data about groundwater wells, we see that the server side uses a PostgreSQL database (running on Amazon Relational Data Service, RDS), AWS Lambda functions for Virus Scanning, Django and Django REST Framework to provide the APIs and business logic, React to provide the client-side component framework (with NodeJS underlying), Puppeteer for PDF generation, and so on. Each of these larger frameworks or components that we use directly also uses other components, creating what we call a dependency tree. The more things we use at the top (we will talk about the roots of this particular tree, but you can flip it upside down and talk about leaves), the more it spreads out below, likely having thousands of small things at the base of the roots.

I just took a quick look at the client-side of a consenting application that we built a few years ago, and that has 1,106 small libraries as part of its dependency tree - and that’s an application where we specifically minimised the number of libraries we were using directly to just a handful. And remember, that’s just the client, the server side has its own tree.

And herein lies our problem. When one of these libraries is updated by its developer to fix a bug, close a security vulnerability or add a feature, then the bit of our tree further up might also need changing to be able to use it (though in many cases it won’t). Sometimes we might have had to implement a work-around to avoid some kind of issue in a library and when that issue is fixed, we will need to remove our work-around for the system to work as designed.

Does everything just fall over when a dependency is changed?

Fortunately not. As part of our process, we can pin the versions of our dependencies (with some flexibility as to whether they are pinned to a specific major or minor version, greater than a specific version, etc). We can also lock a specific set of dependencies (both direct and transitive - child - dependencies) so our build is repeatable on the versions that we have tested to work.

So if dependencies can be locked, why the need for maintenance at all?

Vulnerabilities - Security researchers and component developers are always looking for vulnerabilities in the software they write. When vulnerabilities are found, they ideally need to be patched before they are exploited in the wild (or as soon as practically possible).The source control tools we use are good at telling us that there are reported vulnerabilities in our dependencies, and rating them by severity. However, It is often up to us as developers to determine whether a vulnerability is actually exploitable in the context we are using it, so we don’t always *need* to upgrade right away. Generally-speaking, a lot of small and regular updates to keep on top of these changes is a lot easier to handle than leaving things for a long time and then having a massive and complex and difficult set of changes to deal with. Prompt attention also reduces the surface of our application that might be vulnerable to an attack at any given time.

Support - We often use larger components from other vendors that we need vendor (or project) support for. MadeCurious makes a lot of use of the PostgreSQL database. This is an incredibly stable and secure open-source database, even when using engine versions that are several years old. However, versions of the engine do reach end-of-support life where there are no more security updates and no more support from the hosting provider (usually Amazon’s RDS service in our case). These end-of-support dates are flagged well in advance, and in the case of RDS, we also get a three year window where the version remains available, but becomes VERY expensive to host. Obviously, we don’t want to be wasting money in this way unless the application itself is end-of-life and very close to being decommissioned.

Performance - Later versions of components often have performance optimisations. PostgreSQL is, again, a great example of this. Each new release has optimisations that we can essentially use “for free” (or sometimes with just a bit of effort to tune our queries), which make our system run faster or potentially use fewer resources. Particularly at times of peak load, this might be the difference between a mediocre user experience and a genuinely great one.

Features - All software remains relevant by providing the features that the consumers of the software want and need, and the libraries we use underneath are no exception. Often new features let us simplify our own code (reducing bugs) or provide a better user experience without a lot of new development (a good example would be a client-side table library moving from only supporting paginated display to also providing support for infinite-scroll).

The eco-system your software lives in - Most “useful” business systems don’t live in isolation. They rely on integrations with other systems to provide end-to-end processes for the users. Typical integrations include user authentication, email, document management, spatial databases, ERP/Finance systems and so on. As these systems are upgraded or replaced, there can be a knock-on-effect on anything that talks to them (though good systems architecture practice does aim to minimise direct system-to-system communication, especially as older systems are replaced).

Abandon-ware - The harsh reality is that some components (or components of components) that underpin our system will be abandoned by the developers that created them. They might be abandoned simply because a better way comes along or fashions/patterns change. We don’t want to remain using abandonware if we can avoid it. It may no longer be monitored for new vulnerabilities, and even if it is, there may be no one to fix the issues. We have had cases where we “fork” the abandoned code and maintain it ourselves, but each time we do this it is something else adding a cost to us.

The importance of automated testing

In this modern world, where the foundations that we build our software on undergo regular change, more than ever we need a rapid way of determining that those changes haven’t broken our system. The best way of doing this is to build a full suite of unit, integration and end-to-end tests that can be run automatically by our build tools. This gives us the confidence that updating our dependencies hasn’t impacted our application - or if it has - it helps us find out exactly where something has broken, so we can investigate and mitigate accordingly.

Automated testing really is the key that unlocks regular updates for us. Without that we are reliant on expensive humans, doing time-consuming and unfulfilling work to make sure the system still works after an update. This creates a cycle of fear and reluctance that compounds itself in an ever bigger cliff to scale.

Making the right choices in the first place

The technical choices made when selecting the underlying products and components for your system will have a big impact on future maintainability (and ongoing cost).

One of the reasons why we use Django as our API platform in many of our projects is that it tends to be stable and relatively unchanging. There is a downside that it may take longer to get new features, but the upside is that it is quite easy to stay on top of updates and dependencies. And most importantly, there is essentially no chance that Django itself will suddenly become abandon-ware, so we have strong confidence that we will be able to keep using it several years into the future.

Of course, if the design life of your system is short, a bleeding-edge technology that you know is likely to be a shifting sand may be the right choice for you. It is up to your Software Architects to provide good advice on the platforms and technologies you use.

There is some truth that in all likelihood you will make some choices that prove to be great and some where you will wish you chose differently. None of us have a crystal ball, but experience and common-sense go a long way. Think of it as throwing a dart. We might not be able to throw it perfectly, but we can do a good job of hitting the board.

Software maintenance - that annoying, but necessary, evil

What actually is modern software?

Does everything just fall over when a dependency is changed?

So if dependencies can be locked, why the need for maintenance at all?

The importance of automated testing

Making the right choices in the first place