Until recently1, my job was to synthesize a deep understanding of operating systems, networking, system administration, and my company's application and to use that synthesis to fix our existing systems and design better ones. A lot of folks in the technology industry (particularly in the bubble of Greater San Francisco) use the word "DevOps" when putting out job postings for roughly those tasks, and I just wanted to briefly write about why this word is somewhere between inaccurate and offensive and why you shouldn't use it.
At its core, the notion of "DevOps" is that software development and operations should be run as one organizational unit instead of two — that they should have the same goals and collaborate using the same means. This core has merit, but when put into practice, it tends to result in bullshit. At the two major companies I've had a non-trivial role in (Yelp and Uber), development's primary goals were releasing products, and operations' (teams respectively named "systems" and "infra"; both eventually gutted) primary goals were keeping the production environment up and secure. The customers of the engineering teams were their product managers; the customers of the operations teams were the actual customers of the business (with the exception of operations teams primarily focusing on corporate systems, of course).
Integrating these two, almost without fail, means that operations' primary goals shift to releasing products and supporting the goals of the existing product teams. One could imagine that you get out some new and harmonious universe in which the engineering team develops maturity and respect for the production environment, but what's happened at both of those companies (and what I've seen at dozens of SF and Valley companies that my friends work at) is that a DevOps team becomes myopically focused on "empowering developers" to build their systems and stops caring about the end user. Integration doesn't result in better outcomes or build better teams; it just results in the domination and subjugation of an organization with real and discernible goals into the monster that is engineering at large.
Don't get me wrong – there are also lots of good things that claim to have come out of the DevOps movement. For example, I'm told that there was a time in the dark ages where peer code review was never done by operations teams, and that code review only came over during the "DevOps revolution". I'm pretty sure I remember doing peer reviews before 2009, but I will say that code review is unquestionably good. Folks also claim that automation and configuration management tools are a DevOps product that "legacy" operations organizations never would've used. I don't know about that2, but configuration management is also a good thing.
This post isn't about any of that, though. For the last couple of years, when people use the word DevOps, I've found them to have one of a few very specific concepts in mind; a few things rather more specific than the general notion of integrating operations and development:
People with an operations/system administration background are hard to hire and hard to work with, so we should have operations be done by regular developers, because it's just a matter of giving them a few days of perfunctory training. (I've heard this one from executives from multiple major tech companies recently) As one prominent Yelp VP once said, "It's not that hard, it's just configuration." Obviously, this is offensive to people who actually have experience in system administration and operations (which, I'll note, is a somewhat older profession than programming, and far older than anything that would be called "software engineering"). This also tends to lead to worse outcomes — developers don't want to have to understand and manage those systems, so tend to build towering, custom architectural spaceships3 that ignore what's been learned over the last four or five decades of system and network administration. These architectures and, indeed, companies can succeed, but they do so at great expense to debuggability and maintainability of their systems, and, in my opinion, are awful to work for. See: twitter, uber, airbnb. If you can't hire good operations people, it's probably because you haven't built the kind of company that they want to be at.
** Traditional ops people don't "get" software engineering, so we need to find hybrid unicorn people that can "get it".** This is a common perspective for reasons that escape me. It's true, I've only been doing operations/system administration professionally for 10 years; maybe things were different in the 90's. But I don't think I've worked with a single competent person in the last 10 years who wasn't able to build software. If anything, software development is a significantly easier skill to learn than system administration. Ops people have always written software and designed systems. Trying to portray ops people as Neanderthals who only know how to reboot something when it's broken is offensive to them and just makes the speaker come off as a prick. I still hear this at work and at other companies on an almost-daily basis, though.
Having people who are focused on operations is bad, and operations should just be an aspect of regular software developers' jobs. There was a time when this philosophy went by the name "NoOps" (example), but this seems to be getting folded into the larger "DevOps" umbrella. Here are my issues with this approach:
- There are no systems that don't need operations people. There are systems that you can build that are more automated and require people to understand and debug them, but the kinds of complexities that you put into those systems mean that when they do break, you need much stronger operations people to fix them.
- Operations is harder to teach than development
- Specialization is generally good in a company; you don't require that all of your engineers learn tax law, but you're going to require that they learn how to operate systems and build and debug networks?
- Operations and product teams have fundamentally incompatible goals. If you try to force people who are building products to care about uptime too, well, one of those things is going to suffer. And when SLA breakage is measured in a few cents drop to the stock price but late products is measured by a PM yelling at you all afternoon, I can guess which one is going to suffer.
It's certainly hard to balance the competing needs of product growth and production operations, especially at a high-growth startup. I do sympathize with Management's issue here. And I also see why product engineers (who produce visible product when they succeed) are a far more attractive group than operations people (who, when they succeed, are invisible). It just pains me that, as an industry, so many companies are buying into this bullshit. Even if they don't go so far down the "NoOps" path, just about every interesting technology-focused company has adopted the wrong terminology, and once you concede the terms of discussion, it's just a matter of time until you concede everything.4
Just bear in mind, dear readers, that should you decide to hire me at some point and should you use the word "DevOps" in your recruitment spam, that's not going to start us off on the right foot. Not at all.
Due to some truly impressive political maneuverings, I no longer have authority to do much of anything at
$DAYJOB any more. I'll probably update this when I figure out what I am doing.
This 1988 book suggests that automated software configuration management systems far predate the "DevOps revolution"
An old boss liked to describe a particular kind of software engineer as an "architecture astronaut", which is an image I greatly enjoy and have somewhat embellished upon.
Yes, I subscribe to the Sapir-Whorf hypothesis, at least a little bit.