Index ¦ Archives ¦ Atom

Why developers doing devops without supervision ends up in disaster

Introduction

Here's a fun little thing that happens between a more ops oriented system administrator and developers. Considering the crap colleges produce every year, there's more and more people with fancy degrees that don't mean anything who manage to wrestle the permissions to publish releases from the Ops crowd. Of course, the reasoning goes, these Ops people don't know anything, and are demented burned out wrecks who block every attempt at progress because of laziness, lack of knowledge, worries about being made obsolete etc.

After all, if they're the lead illustrious developer, why shouldn't they be allowed to publish packages without restrictions?

Congratulations, your pipeline is now a vector for malware distribution. Or you caused an outage.

Why DevOps keeps getting renamed

the computer industry is more fashion-driven than women's fashion and cloud computing is simply the latest fashion - Larry Ellison

Sillicon Valley likes to rename things in an effort to stay relevant. It's similar to Ron Swanson's mortal enemy in Parks and Recreation: Annabel Porter. If you give enough time to SV, they're going to reinvent cow milk. Somehow.

The fact of the matter is that most ideas are older than SV, the Sillicon Valley product is a cheap knock-off of something useful and it's the one that appears when you google it, and it's the one the developers blindly implement.

Here's a tip: You can't google your way to a good DevOps system.

TL;DR

For those who can't be bothered to read:

1) All software components that are part of the pipeline should verify signatures

2) Sign your commits and reject unsigned commits to the git repository.

3) Artifacts should be signed using a key stored on a physical device (smartcard, HSM, etc.)

4) There need to be firebreaks and manual controls

5) Most software developers don't care about security in the face of failure (but sysadmins do)

6) If there isn't a QA step in your pipeline, you will publish something broken to customers

7) Most software lacks meaningful security controls

8) Most software is not configured for auditing

The above is nothing new, and it's what every system administrator has been yelling about for years. Sadly, automation porn is the new reality against which every DevOps engineer will be benchmarked.

DevSecOps is the new(ish) paradigm

Turns out, if you remove the Ops veto from shipping things, you end up in a situation where security complains. Shipping unsigned dependencies you downloaded from the internet is going to end up in disaster.

While your Ops people are agonizing about the xz backdoor attempt, the developers are googling gitlab automation. Guess which one looks more productive?

DevOps (and platform engineering) is about QA: Shipping things safely, not fast

One of the key parts of the Toyota system isn't the ability to sling cars as fast as possible through the process of assembly, it is the ability (and willingness) to stop the assembly line to ensure quality. This saves enormous amount of resources. Because the process is well designed, it's possible to focus on quality since most of the toil has been reduced thanks to automation.

1) All software components that are part of the pipeline need to verify signatures

This is easier said than done. Things usually fail immediately, since the container image being used is some unsigned thing downloaded off of dockerhub. Then a bunch of stuff gets downloaded from the internet from third party repositories which require gpg keys (yay) but the infrastructure containing the gpg keys might not really be separate (boo).

2) Sign your commits and reject unsigned commits

Personally, the only platform where I could sign a commit using a physical smartcard is Linux. Wrangling the other two (MacOS, Windows) to work has been on my todo list for the past 5 years. The situation might be better today, but it certainly remains a challenge.

3) Artifacts should be signed using a key stored on a physical device

This one is hard to implement, since 2) and 3) work against you. MacOS and Windows are not operating systems for professional use, they're born out of the bottomless pit of massive consumerism. Linux is no better for that matter, but at least Linux has ported some of the tools from the UNIX systems (which were systems for computing professionals). MacOS has a chance to have those tools (thanks to homebrew), but Windows requires prayers to the old gods for cygwin to work.

Now 3) requires an investment into hardware (smartcards, FIDO tokens etc.). In other words, changes to the developer workflow and how people work. And if there is one sacred cow, it's a developers IDE and development environment.

4) There need to be firebreaks and manual controls

What's the point of automation if you have all of these manual controls? The fact of the matter is that the tooling is not mature enough to survive a compromised system. If one of the components in the pipeline gets compromised, anything downstream of that point is a high-speed highway for malware distribution.

Ideally, the pipeline should only be available when a release is imminent. This means that the secrets required to build the image are locked away during weekends and other periods when there is no need for the pipeline to be available. In other words, a pipeline does not go in the asset column, it's in the liability column requiring constant maintenance.

5) Most software developers don't care about security in the face of failure

The way modern organisations are laid out, when the software developer ship something successfully (even if it's a bug-ridden insecure mess) they get credit, but once there is a problem, it's the Ops side that is at fault. If this were Russian roulette, it would be taking the bullet away from the handgun on the software developer turn, and adding it to the handgun of the Ops side.

6) If there isn't a QA step in your pipeline, you will publish something broken to customers

I think the fundamental difference between the way professional DevOps is done is the difference in the attitude to automated testing and alerting. These are required to have a pipeline work autonomously. Yet every time there's a pipeline built by developers, it's some end-to-end testing solution that happens to have no relation to the binary used by the customer. It's not an end-to-end testing solution if it's not testing what the customer has in production. It's at best, a decent attempt at subcomponent testing.

7) Most software lacks meaningful security controls

This is why 4) is important. A pipeline where the artifacts are not attested and signed cannot be left unsupervised. And a pipeline where the inputs are not checked also cannot be left unsupervised. Most APIs don't seem to have the capability to reject an uploaded image / artifact if it's not signed by a known-good cryptographic key. In many cases, you are forced to manually perform these steps (by logging in through a 2FA or multi-factor authentication interface) if you wish to have any semblance of security.

A lot of security features are also locked behind paywalls (i.e. enterprise versions).

8) Most software is not configured for auditing

Build reports, generated artifact checksums and their signatures would ideally be handled automatically. If an artifact got replaced by a compromised version, how will you know without checksums stored (ideally) in cold storage? So even if you automate everything, there needs to be a separate pull mechanism which collects this information.

Nobody cares

The depressing thing is that the security of most systems is largely dependant on draconian retaliation (from corporate, or government institutions), and restrictive network security policies.The current systems work good enough - every few year there's a high-profile compromise of some system, the FBI puts some people in jail, and there's a quiet period afterward.

All the language-specific package repositories lack meaningful signing mechanisms. Pypi removed theirs (https://blog.pypi.org/posts/2023-05-23-removing-pgp/)

Rust is adding some support for signing (https://foundation.rust-lang.org/news/2023-12-21-improving-supply-chain-security/).

I'm sure in a few years rust is going to remove support for signing citing 'lack of use'.

DevOps is going to suck for a long while

Considering the likelihood of a recession, which limits funding to open source and will cause existing maintainers to switch to more profitable careers, two things are likely to happen: Quality of existing software will drop, new maintainers will be recruited who are more likely to be plants / nefarious, and also, the increased paranoia will impede even honest attempts at maintainership.

All the while, the older generation gets to snicker (why try new tooling?), the devs get to snicker, and things remain the same.

© Bruno Henc. Built using Pelican. Theme by Giulio Fidente on github.