Tuesday 11 June 2019

Installation Testing @ Neo4j

At work we produce a database product which we deliver to customers in a variety of formats. Tarballs and zips you can just extract and use on your home computer, be it Linux, Mac or Windows; A Docker image for the hipsters; and old school Debian and RPM packages for the slow movers^H^H^H^H^H^H^H^H^H^H^Hstable enterprises. Oh and shameless, out of context plug.

We had all those packages sans Docker when I joined years ago, and they weren't great. Testing was manual and not thorough at all. They have improved, just by having skilled people work on them - so far, so regular every day software. These are critical components actually, they are a core mechanism for getting our software to customers, and customers in turn rely on them for their critical business operations. So no pressure.

Anyway, one day I had an epiphany: why don't we just write automated tests around these packages, so that we can have confidence they work as expected when making changes and adding features? Duh!

We do that for everything else, and on multiple levels. When you pare it back this is deterministic stuff about file locations and permissions, really. Turns out we already had all the building blocks and blueprints, but I still feel this was a tiny local game changer.

Asking around the place it was clear this was not something anyone had seen before, and I do think we are quite a testing-forward place too. Some spitballing by the water cooler with my mate Steven Baker helped clarify things - he's had the actual package knowledge, I was just the ideas man - and we were ready to go. And so the very first Installation Testing framework was birthed.

So what is it?


Well what is a package? It's a thing that runs on a computer, it puts files in places when you install, in our case it starts a service that you can poke with a few ancillary tools, and you can uninstall and even purge it when you lose interest.

So there are the contours of the framework: exercising the package installer is a case of running it and observing files appear where expected, with expected permissions, and verifying a service is started automatically and becomes available. That the right dependencies are installed too. Uninstalling and purging is the reverse. A bit of poking at the ancillary tools to see they are also working. Easy peasy!

Once you get into it, a first challenge is sandboxing and isolating: you want a clean slate for every run, so that you have reproducibility and an aura of science about it. We already do this in other areas, using throw-away AWS instances. Spot priced too because economics, and these days you are charged by the minute so very little waste. Indeed the catalogue of AWS instance OSes helps us reach different OSes (Debian, Ubuntu, RHEL, Amazon Linux, ...) at different versions, spanning the space our customer base lives in. So much winning there.

The tests are really a series of commands sent via SSH, there are calls and waits and assertions like you would use in similar system testing, classic stuff. It gets frameworky once you realise the same high level script applies to each platform, yay reuse. But really it is basic stuff when you think about it.

Now, there were stumbles. Internal ones like picking technology, where we tried and discarded Vagrant+VirtualBox as the container for the different platforms, we couldn't make it work reliably for us. We also discarded Cucumber for the high level scripting, because really, we don't have non-coders looking at this and Ruby isn't a core skill here (we're a 4j shop remember!), so a bunch of hassle with no payoff.

External stumbles currently include Zypper which is a PITA to work with, and of course all the little niggles and of course bugs found+fixed. But external stumbles is exactly what we hunting for here, so that's just grand.

Status


Current incarnation of Installation Testing was implemented by my good colleague Jenny Owen as Java + JUnit with Maven, using the standard AWS client libraries and JSch. It works great, we have readable code and running this in CI gives us so much clarity and confidence which we never had before. There are thoughts about open sourcing it as it might be useful to others. Do you think it is? I'd love to hear experience reports if you faced similar problems.

So, you launch AWS instances, commandeer them via SSH, exercise and evaluate the software, terminate the instance. In parallel across the 12 and growing platforms multiplied by currently 2 editions of Neo4j we are interested in, for that low latency feedback. Textbook. What's not to like?