On software versions and the brittleness of tools

I happened to stumble into a troubleshooting session where one person was trying to set up a container based development environment in order to do some work with Python 2.7 and Ansible. It was very stubbornly not working, and, while discussing various avenues of investigation (is it the Ansible version, the virtual environment, is poetry doing something etc.), I also decided to ask why - more specifically "Why Python 2.7?". The resulting discussion and tweets are the main catalyst for this post.

On using end-of-life software

The immediate answers might not surprise you, as using Python 2.7 was not the engineer's choice, but a "hard requirement" from the customer whose project it was. It had to be done, but, I hoped, with enough warning bells and some understanding of why it's not a great choice in the long run.

Note: For those of you not exactly familiar with Python versioning, Python 2.7 has been End-of-Life for almost two years at the time of writing, with a transition period (to Python 3.x) of almost a decade. Not as record breaking as the neverending IPv4 to IPv6 transition, but I digress. While the Python Software Foundation stopped all development and long-term maintenance of the Python 2.7 releases in early 2020, various enterprise focused operating systems are still keeping it on some sort of extended life-support.

While using Python 2.7 on its own (a supremely mature version, refined over that prolonged transition period) is very likely safe enough, it's rarely the case - especially in network automation - that only the standard library will suffice. Indeed, in this case, Ansible was also used on top (with plenty of dependencies), and the goal was to also develop some additional custom plugins for it.

It's why I decided to send out my initial tweet, my horror stemming from the knowledge of how complex a dependency tree pieces of software like Ansible have in the Python ecosystem and how hard it is to ensure that some combination of versions from even a few years ago still work reliably today.

Let's say you need to write a new plugin for Ansible that connects to some arcane device type supported only by the Netmiko library (so text CLI over SSH). In this example, since Netmiko dropped support for Python 2, you need to find a version that still works. After a bit of digging, you find out that Netmiko became Python 3 only in version 3.0.0 (Jan 2020), so you end up pulling Netmiko 2.4.2 from Sep 2019. That's a 2+ year old version, that does not benefit from any of the bugfixes and improvements contributed ever since to the library, but most importantly, it's not largely used or tested anymore. Should you encounter issues with it, you're on your own, having to develop your own patches, test them, and keep the whole machine going.

Many such Python libraries have dropped support for Python 2.7 in the last couple years. Therefore, any program that wants to use them has to either leverage a compatibility layer (if it even works) or just stick to the last release that explicitly supported Python 2.7. Complexity and brittleness go up, reliability goes down.

Back to those requirements

At the beginning of the post, I mentioned that the client had a set of hard requirements for the project. There might be plenty of legitimate reasons to have such requirements and the solution to these is never immediate, therefore the world keeps turning and problems need to be solved.

A couple of examples that come to mind:

The runtime environment is locked to a corporate gold standard image of an enterprise linux that only has Python 2.7 (supported by the vendor well beyond its normal EoL).
The libraries/SDKs used to connect to a particular network OS (or other equipment) have not been updated by the vendor to Python 3.

The solution to the first one might be using containers, so that one could have a more modern set of tools and runtimes packaged together, while still running on the same Linux hosts. For the second one, the solution might be choosing the network vendor more carefully next time and including requirements for automation functionality to the RFPs.

There could also be other, more unsavoury, reasons for these requirements, such as the corporate IT standards not keeping up to date with developments in the industry, or good old security theater gone wrong ("only Python 2.7 is approved for use as per the list built by the security team").

No matter what the background is, hopefully it's pretty clear that it's far from an ideal situation and, even if one has to fix a particularly thorny issue with Python 2.7 today, the same high priority should be given to fixing the cause for that Python 2.7 requirement so it stops piling on more technical debt in future projects.

Final thoughts

Someone jokingly (or not) suggested that supporting such end-of-life versions should come with a premium... and leaving aside the debate whether this should even be an option, the costs will be higher even without explicitly being so. At a minimum, there will be hidden costs all over, such as time spent setting up outdated environments, or fixing the new-old problems arising from taking something that works and walking it backwards a few years. Things will break more often, there will be fewer people capable of untangling and fixing problems, and even more time spent doing so.

A friend almost compared Python 2.7 to COBOL - and, hmm, you know what, there absolutely is a non-zero chance in the immediate future for such a consulting gold-mine given Python's popularity. But, perhaps, we can avoid such situations better by giving more thought to what are the ramifications of seemingly clear "hard requirements".

And, as always, thanks for reading.

ON SOFTWARE VERSIONS AND THE BRITTLENESS OF TOOLS

On using end-of-life software

Back to those requirements

Final thoughts

Any comments? Contact me via Mastodon or e-mail.

Share & Subscribe!