All articles
37 articles · updated weekly See our Tools
All articles
Tutorials

What DevOps Actually Is Beyond the Tools

DevOps isn't a pipeline or a job title. It's shared ownership between the people who write code and the people who run it in production — and why most teams get it wrong.

COVER · Tutorials

In an infrastructure kickoff meeting, the platform engineer said the team needed to "adopt DevOps." Three weeks later, they had Jenkins configured, a GitLab pipeline, and a metrics dashboard nobody opened. The same problems remained: Friday 5pm deploys, bugs that only appeared in production, and dev and ops who only talked through tickets.

New tools. Old culture.


DevOps is not a job title and not a pipeline

The most common mistake I see in teams is treating DevOps as a role or a stack. Some companies created a "DevOps Engineer" position and dumped all responsibility for CI/CD, containers, monitoring, and cloud onto one person. The development team kept doing what they always did. The operations team did too. There was just a Jenkins in the middle.

The original premise of DevOps — traced back to The Phoenix Project and the work of Gene Kim, Jez Humble, and Patrick Debois — is that the bottleneck in software delivery isn't technology. It's the organizational separation between the people who write code and the people who put it into production.

When dev and ops are silos, their incentives are structurally opposed: dev wants to ship fast, ops wants stability. The result is the classic cycle: dev throws code over the wall, ops tries not to break production, and everyone blames each other when something fails.

DevOps is the attempt to solve that — not with tools, but with shared ownership. "You build it, you run it," as Jeff Bezos put it for AWS back in 2006.


What actually changes when DevOps works

It's not the absence of downtime. It's the speed of recovery.

Teams that genuinely practice DevOps aren't infallible — they fail fast, detect fast, and recover fast. DORA (DevOps Research and Assessment), which measures engineering team health across companies worldwide, uses four metrics as a proxy for maturity:

  • Deployment frequency — how often the team ships to production
  • Lead time for changes — time from commit to production
  • Change failure rate — percentage of deploys that cause an incident
  • Mean time to restore (MTTR) — how long to recover from an incident

What the data has shown consistently since 2019 is that high-performing teams don't trade speed for stability. They have both. Lead time in hours (not weeks), multiple deploys per day, MTTR in minutes.

This doesn't happen because they have better tools. It happens because the feedback loop is short — between writing code and seeing the result in production.


The myths that persist

"DevOps means devs also do operations"

Partially true, but not literally. A developer doesn't need to be a network engineer or kernel specialist. What changes is that developers have visibility into what happens in production and ownership over the operation of the services they wrote — observability, alerts, a basic runbook. That's not the same as full infrastructure on-call.

"CI/CD is DevOps"

CI/CD is a practice that enables DevOps. You can have a flawless pipeline and still have a deploy approval process that takes five days and requires sign-off from three managers. Technically, CI/CD. Culturally, the opposite of DevOps.

"You need Kubernetes to do DevOps"

A two-person team on a VPS with git push + bash deploy script practices DevOps better than a twenty-person team with managed Kubernetes run by a separate team holding weekly change approval meetings. Infrastructure complexity is not a proxy for maturity.

"DevOps fixes communication between teams"

Not directly. DevOps solves the specific problem of dev vs. ops. If the problem is communication between product and engineering, or between engineering and security, or between platform teams and product teams — that's an organizational and process problem DevOps doesn't address.


What blocks the cultural shift

In the teams where I've seen the change work, the turning point was always the same: someone with authority decided that the development team is responsible for what runs in production. Not partially — entirely.

What blocks that shift in practice:

Inherited ITIL change management processes. Change advisory boards that manually approve deploys, monthly maintenance windows, rollback that requires a ticket. It's not pointless bureaucracy — it has a legitimate origin in regulated environments. But applied indiscriminately to modern web services, it becomes process overhead that exceeds the risk it's supposed to mitigate.

Environment promotion without automation. If the process of moving code from staging to production is manual and involves different people with different approvals, you've created a human bottleneck in the critical path. Environment automation is a prerequisite, not a feature.

Team metrics that incentivize silos. If the ops team is evaluated on availability and the dev team on delivery speed, the incentive conflict is in the performance system — not in the people. Changing the culture without changing the metrics is wishful thinking.


Automation as a consequence, not the goal

A mistake I made on a previous team was selling DevOps as "let's automate everything." It's an easy pitch — it has tangible ROI, clear deliverables, everyone understands what a green pipeline looks like.

The problem is that automation without observability is a faster black box. You ship to production faster and discover problems faster too — but if the team has no culture of monitoring, alerts, and postmortems, the higher velocity just accelerates the incident cycle.

The order that works:

  1. Observability first — structured logs, application metrics, alerts that reach someone who can act
  2. Reliable deployments — pipeline that runs against the same state as production
  3. Short feedback loop — from commit to production in minutes, not hours
  4. Simple rollback — that any developer can execute without ceremony

None of this requires Kubernetes. It requires a decision.


The role of recurring automations

One of the first things that changes in a team maturing in DevOps is how it handles recurring operational tasks. Database backups, log rotation, report generation, queue cleanup — all of it becomes documented, monitored cron jobs with alerts when they fail.

The problem is that cron expressions are one of those things every developer writes from memory, thinks is correct, and only discovers the error when the job didn't run in the expected window. A poorly configured weekly job can skip executions for months before anyone notices. To validate the expression before committing, I use the Cron Expression Generator — you see the next scheduled executions with full date and time, which resolves any uncertainty about timezone or ambiguous fields before going live.


Frequently asked questions

Are DevOps and SRE the same thing?

Not exactly. SRE (Site Reliability Engineering) is a specific implementation of DevOps created at Google — it uses software engineers to solve problems that would traditionally be handled by operations. An SRE writes code to automate ops tasks. The focus is measurable reliability via SLOs and error budgets. DevOps is the broader concept; SRE is one concrete way to implement it at scale.

Do small companies need DevOps?

Yes, and it's easier there. In small teams, dev and ops naturally overlap — a startup developer who deploys their own service and watches the logs is already practicing DevOps. The DevOps challenge is maintaining that proximity as the team grows and the temptation to create specialized silos increases.

How do you measure whether the team is improving?

Use the four DORA metrics as a baseline. Lead time and deployment frequency are the easiest to instrument — a pipeline with a commit timestamp and a deploy timestamp already gives you lead time. Change failure rate requires that incidents be recorded and correlated with deploys. MTTR requires that incidents have detection and resolution times recorded. If you have none of this data today, that's the first step: measure before you optimize.

Does DevOps work in companies with heavy compliance requirements (SOC 2, PCI, HIPAA)?

Yes, but it requires intentional design. Compliance isn't incompatible with fast delivery — it's incompatible with manual approval as a control mechanism. The solution is compliance as code: security policies verified automatically in the pipeline, separation of duties implemented via access control, audit trails via immutable logs. More upfront work, but the result is more robust compliance that depends less on manual process.


"You build it, you run it" is the phrase that matters

DevOps won't be solved with more tools. The next observability platform, the next container orchestrator, the next CI/CD SaaS — all of them are multipliers. They multiply what already exists. If what exists is a team with opposing incentives and a long feedback loop, the tools multiply the problem.

The change that matters is in ownership. The team that wrote the code is the team that gets paged when it goes down. That changes how the code is written, how it's tested, how it's documented, how much care goes into the deploy. Not because someone mandated it — because the cost of not doing it becomes visible quickly.

Tools help when ownership is already in the right place.

RD
Author
Rafael Duarte
Desenvolvedor backend com passagem por fintech e SaaS B2B — trabalhou em times que escalaram APIs de zero a milhões de requisições. Carrega cicatrizes de produção suficientes para ter opiniões fortes sobre ferramentas, padrões e decisões de arquitetura. Não é acadêmico: leu a RFC do UUID quando precisou escolher entre v4 e v7 para uma tabela de alta escrita.
View profile