DORA metrics in 2023: 5 ways to measure DevOps performance
In the world of software engineering, there is no shortage of ambiguity about the most productive way to build and deploy your software.
It can sometimes be difficult to sift through all the noise. Should you use microservices or a monolith? Should all your teams use the same programming language? Should you migrate our application to use a more modern tech stack?
All the noise and debate over these examples are valid, but might not matter as much as what you measure.
To get a better grasp on what productive software delivery looks like, it’s better to focus on satisfying specific measures of productivity as identified by the DevOps Research and Assessment team (DORA).
What are DORA metrics?
Before diving into the metrics, we need a bit of a history lesson to make sense of why they are important. You may be thinking – why should I pay attention to these metrics? How were these metrics chosen?
Back in 2013, the inspiration for these metrics stemmed from mere curiosity of a small team of technology professionals: What makes high-performing technology organizations great?
Past attempts at answering this question have relied on proxy metrics, such as lines of code, velocity, and utilization. These methods were not really backed by any studious research and a more scientific approach was needed to satisfy this question.
With a survey of questions in hand, the team distributed their survey to more than 2,000 companies and collected 23,000 survey responses from around the world.
The results were first presented in the 2014 State of DevOps Report and was one of the largest scientific reports in the industry to provide insights on what makes a performant software organization. This report is released every year and the DORA team’s research and questions have undergone many iterations since its inception.
The true constant of this work has been the four metrics the DORA team identified to measure software delivery performance:
- Deployment frequency
- Lead time for changes
- Time to restore service
- Change failure rate
Honorary mention goes to a fifth metric that was introduced in 2021 to measure operational performance: reliability
Based upon the DORA team’s survey results, they analyze the trends they observe and categorize performance in each metric for a high, medium, and low-performing organization.
Let’s take a look at each metric, see why it was chosen, and what a high performer looks like according to the 2022 State of DevOps Report.
Deployment frequency
A deployment is defined as releasing your code to production or an app store. The DORA team used deployment frequency as it was a similar concept to reducing batch size in traditional manufacturing engineering – which is a central element of the Lean paradigm.
With a high rate of deployment frequency, software teams are pushing out smaller changes more frequently, getting continuous feedback from their customers which means they can iterate on improvements much faster.
Performance in this metric was gauged by how many times a team deployed over a period of time:
- High: on-demand (multiple times a day)
- Medium: between once per week and once per month
- Low: between once per month and once every six months
Lead time for changes
Customers don’t like to be kept waiting when they make a request, and it’s also not beneficial for the company either. Much like deployment frequency, a short lead time for changes means faster feedback from customers because of the shortened time between the initial request and that request running in production.
This metric is also important for fixing situations like defects, bugs, or outages in a timely manner.
This metric’s performance was determined by the amount of days, weeks, or months it takes to satisfy a customer request:
- High: between one day and one week
- Medium: between one week and one month
- Low: between one month and six months
Time to restore service
This metric was introduced to balance out whether teams were sacrificing performance in favor of faster delivery. Software has grown much larger and more complex over time, and there is always potential for unplanned, lengthy downtimes if care isn’t taken all throughout the software delivery lifecycle.
This performance is gauged under the same options as lead time:
- High: between one day and one week
- Medium: between one week and one month
- Low: between one month and six months
Change failure rate
Changes are typically the number one source of instability in a software application. As much as we like to test, we can never find all the bugs, and change just ramps up the potential to introduce more of them.
Therefore, when bugs or defects take down or degrade your service, it’s a sign that your CI/CD systems might not be optimal and could use some tweaking: perhaps in the form of better testing, delivery practices (canary, blue-green deployments), or smaller changes entirely.
This one is measured by a percentage of software changes that result in immediate fixing:
- High: 0%-15%
- Medium: 16%-30%
- Low: 46%-60%
Reliability
The new kid on the block, reliability was brought in to introduce the importance of operational excellence to a high-performing software organization. This metric is based upon how well you meet your user’s expectations, such as availability and performance.
This metric doesn’t have a defined high, medium, or low clustering as the way teams measure reliability can vary widely depending on the service-level indicators or service-level objectives (SLI/SLO). Instead, teams were asked to rate their ability to meet their own reliability targets.
How to implement DORA metrics?
All of these measurements sound great. But how do you actually measure this at a software company? How can you make changes based upon our measurements?
Some advice is to start small and work with what you have. If it’s all manually-driven at the beginning – that’s fine. At least you are gathering information about these metrics and are able to track it over a period of time to see whether you’re improving or not.
For example, you could start measuring your time to restore metrics for every incident that you experience at the company. That’s a simple metric to measure that doesn’t require any new tooling or automation to measure.
From there, you could get more involved from an automation perspective for the other metrics. Can you track your frequency deployment through CI/CD tooling? Is it possible to audit your Git history monthly for rollbacks to detect your change failure rate?
There’s no shortage of ways to do it. What you choose will likely be unique to your organization.
Like anything, gathering DORA metrics is not exempt from the build vs. buy debate either. There is a variety of tooling out there designed to solve this problem as well.
Conclusion
Going back to the questions posed at the beginning of this blog can be framed in a new light with DORA metrics. For example, if we use microservices, will we be able to deploy smaller changes more frequently? Will it make debugging more complicated and increase our time to restore?
With those questions in mind, debates such as monolith vs. microservices can be addressed under this structure, and the metrics can support whether your decision is actually improving your software delivery performance.
They are important questions to pose, but without the metrics, it’s hard to know if they’re yielding the right results.