OKRs - not as Great as You Think!

Recently, I made a short Linkedin post that was quite critical of OKRs, Objectives and Key Results. The post got a few likes, and comments who both agreed and disagreed with me. I found one of the comments that disagreed with me particularly interesting, because it referenced an article that was very positive to OKRs.

That article contained pretty much everything that makes me doubt that OKRs are useful. I’ll walk through it, but first, here is my original Linkedin post:

Here is why I do not trust OKRs, in three quotes:
"When a measure becomes a target, it ceases to be a good measure."
-Goodhart's Law, https://lnkd.in/dUgcBEMT
"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
-Campbell's Law, https://lnkd.in/dFyeu4K3
"Key Results benchmark and monitor how we get to the objective. Effective KRs are specific and time-bound, aggressive yet realistic. Most of all, they are measurable and verifiable. You either meet a key result’s requirements or you don’t; there is no gray area, no room for doubt."
-What is an OKR? Definition and examples, https://lnkd.in/dqtTqYp8
In other words, the OKR system is designed to implement the kind of target setting Goodwin and Campbell warn about.
Judging from what I have seen, OKRs are a great way to train people to fudge results.
Add in that OKRs oversimplify the dependencies between different objectives, and that OKR data is usually presented with to few data points to be useful in decision making, and I have a difficult time understanding the enthusiasm.
I do like metrics, but I doubt that successful companies that use OKRs became successful because they used OKRs.
#strategy #management #leadership

Let’s look at the article used as evidence that OKRs work, and see whether it holds up. The article is named Implementing OKRs. A tale from the trenches.

The case of the misleading case

First of all, the article describes a single case of a company that implemented OKRs, set ambitious targets, and then saw the company meet those targets. That is problematic, because one, or even a few, cases do not prove, or disprove anything.

Look at the picture below.

Which of the three cases in the picture is closest to the truth? The author of the article clearly believes it is Case 1, but there is no evidence of that. The truth may be closer to Case 2, or Case 3.

Selection bias and lack of scientific research

When in doubt, turn to science! I googled “okr scientific research”, and a few other sets of search terms, but found no statistical studies, or metastudies.

The closes I got, was a question on Quora:

“Is there any peer-reviewed research showing that the Objectives and Key Results (OKR) model actually improves efficiency, productivity, or overall company performance?”

Unfortunately, because of the selection bias inherent in the way the question was phrased any answers won’t be much use. Studies that do not show that OKRs work, will never be included in the answers. What you need is studies that prefer statistically significant results. Preferably, we should have metastudies, because they produce better results than any single study can.

As far as I can tell, the research on OKRs is so inadequate that we cannot turn to direct scientific research to either prove or disprove whether it works.

People burning bright, but not for long…

We can, however, use indirect methods to evaluate whether OKRs are likely to work, or not. You create an OKR by setting an agressive, relatively short term, goal, and a target value to reach.

The article author did this by setting nine BHAGs, Big, Hairy Audicious Goals, figured out what reasonable targets would be, and then doubled them. In the words of the article author:

We took those numbers and we doubled them. Why? Because a) why not? That way if we aimed for twice the volume of our plan and we missed by half, we’d still hit our plan. If we actually hit those goals, even better! And b) because you are only supposed to get 70% on your OKRs. These are stretch goals, so come on, stretch a bit. Yeaaaah, that’s it.

The article then continues, describing how quarterly OKRs were set, followed by individual OKRs. The author also asked advice about OKRs on Facebook. That probably yields as reliable advice as asking about Covid-19 advice on Facebook.

Did it work? The article author is convinced it did:

When Brew and I traveled on business, shit got done! Decisions were made! Targets were hit!

He also has a credible sounding explanation for why the targets were hit:

Once you see what other people are signing up for, and you see the plans they are putting in place to achieve them, you get religion. You believe they are going to hit their goals. And you don’t want to be the only one not hitting her goals. So you work. You work hard. Instead of watching what everyone else is doing and reacting, you start focusing much more on achieving your Objectives and just assume they’ll hit theirs. OKRs help everyone focus, and that drives results.

The author does not specify exactly what OKRs they used, but he does provide three examples of OKRs he believes are good ones:

Increase leads from 300 to 700
Improve conversion funnel efficiency by 30%
Increase sales volume by 70%

Sounds pretty clear cut: OKRs are a recipe for success…or is there something wrong with this picture?

Yes, there is something wrong! For one thing, the author claims success comes from everyone working harder. Note that working harder and working smarter are two very different things.

Leads increase from 300 to 700. That is an increase of 133%. If that increase in results came from working harder, it is reasonable to assume there was a corresponding increase in effort. If people worked 40 hour weeks before OKRs, with a 133% increase, they would now work more than 93 hours per week.

Did they work 93 hours per week? We do not know, because there was no OKR for that, but it is quite possible that the OKR system pushed people to work way more than is sustainable. What happened to personnel turnover? The article did not mention that either, so we do not know.

Good metrics are not isolated from each other. They are linked into a system. Such a system would have metrics that prevent people from being overworked. A good metrics system should also be designed to be self-correcting. You have both leading and trailing metrics. You use the leading metrics to guide behavior changes, and trailing metrics to see if you actually got the results you thought you’d get.

OKRs do none of that. OKRs are treated as independent. That is why, with OKRs, it is easy to turn a workplace into a 93 work-hour per week sweatshop.

On the other hand, we do not know that people worked more hours. There might be some other explanation.

I do not have data from the article author’s company, but I do have data from the project teams I lead, so let’s use some of my project data, and see where it gets us. First let’s compare productivity data for two periods, similar to what you would get if you used OKRs. Let’s go:

Period 21: 4 value units
Period 22: 47 value units

Wow! From 4 units to 47 units! That is an increase of 1075%! How audacious target setters we must have been! What a Big Hairy Audacious Goal we must have had!

Lets look at the productivity the quarter after that:

Period 23: 19 value units

That is a drop of 60%. How could the geniuses of the previous quarter turn into such bumbling idiots? Off with their heads!

Well, that last thing might be overreacting a bit. Let’s look at why we have such huge increases and decreases in results.

The picture above is a Process Behavior Chart, also known as a Process Control Chart. It shows that although we have a lot of variation in productivity, the variation always stays within the upper and lower process limits. These limits are calculated using a statistical method. I won’t go into details in this article. The important thing is that the system is stable within the process limits. All the variation is random.

With OKRs, this randomness cannot be distinguished from actual change, so it is entirely possible that the 133% increase in leads the OKR article author has in his example, is entirely due to random variation.

I don’t know, and neither does he! The difference is, Process Behavior Charts make it possible to distinguish real change from random variation. OKRs do not do that. Even worse, OKRs lead you to believe you understand a situation, and that you control it, while you really don’t.

Targets corrupt!

What happens when you set a target? Lets look at the same Process behavior Chart, but this time I have marked some target levels. I have also color graded the area from zero to the upper process limit, in colors from green to red. The more green, the easier it is to make a target. The more red, the more difficult it is to reach a target.

It is important to remember, that we are looking at the system as it actually is. To get the system to behave differently, we need to change it in some way. If we set targets and empower people, but do not teach how to change the system, they have only a few alternatives:

Hope they get lucky. Works once in awhile, but the game tends to be stacked against the people doing the actual work, especially since OKRs mandates setting targets without providing any method of figuring out what a reasonable target is.
Work more hours. Common, and often works in a very short perspective, but not sustainable over time. Can have disastrous side effects, like increasing personnel turnover, which drains the organization of competence, and destroys organizational memory.
Cheat. Usually the safest option. Risk of discovery is small. Even if discovered, risk of punishment is not that high.
Teach themselves how to change the system, change the system, and then pray that they really were as empowered as they were told they were. If they were not, punishment for doing actual change is usually swift and unmerciful. This is difficult, time consuming, dangerous, and often provides little benefit to the people who do it, even if the change succeeds.

There are 25 data points in the series. If we want a 70% chance of success, that is we want 17 data points at the target level or higher, we should set the target at around 5. That is well below the average, which is slightly above 20.

Setting the target to a 70% success rate based on future performance would require a time machine. If you don’t have one, trying will just be silly.

Let’s assume for a minute that you know the average productivity is 20. Note that with OKRs, you don’t bother with averages, so you need something else to provide that information. If you set the target at double the average, 40, you have a 20% chance of reaching it, even if you do not change anything about how you work. I’d say doubling the average sounds like an aggressive target.

That means, in our example, 20% of the times you set a target that is double the average, you will trick yourself into believing that OKRs work. People like writing about their successes much more than they like writing about their failures, so the people who write about OKRs are likely to come from a group that sets target like this, or, since they usually do not know what the average is, have set the targets way into the green.

It is a popular belief that if you empower the people who do the work, they will start doing the systemic improvements necessary to meet audacious targets. Unfortunately, that requires that they also know how to improve the system. Sometimes people do, but most of the time, they don’t. Why not? One reason that stands out is that neither schools, nor workplaces, teach effective methods for improving systems. Universities sometimes do,

Plenty of such methods exist. There is a plethora of different improvement methods based on complexity thinking, systems thinking, and statistics. Some of these methods have tools that are fairly easy to use, like Five Why in Lean/TPS, or Evaporating Clouds in The Logical Thinking Process (which is one of the tool sets in the Theory Of Constraints). Others, like Process Behavior Charts, are more difficult to use. Easy or hard, they all have one thing in common:

They are rarely taught!

Imagine that someone sets a target for you, and you have no idea how to achieve it. Imagine that whoever sets the target also tells you “you are empowered to do whatever it takes to reach the target, just don’t fail”.

If the target looks possible to achieve, you will almost certainly continue the way you are used to, and hope for the best.

If the target is way out there, and looks impossible to achieve, there is only one thing to do:

You cheat!

Some years ago, I worked for a company that set very challenging targets for its sales people, in several different markets. I was in close contact with people from one of the markets, and they told me how they had to cheat to meet their targets. They hated it! It’s just that they had to. They told me all the other markets were in a similar position, and did the same thing.

In their case, they did not manipulate the total number of units sold, but they did mess around with the dates when things were sold, and that was enough to ensure that they were on target often enough to avoid getting punished for missing them.

In software development, which I usually work with, management likes to set targets like “80% automated test coverage” for an application. That is particularly useless, because the need for tests varies between different parts of an application. In general, it is useless to write tests for something called accessor methods, but there are a lot of accessor methods, and they are easy to write automated tests for, so that is what often happens.

Another little trick: If management measures software development team velocity, usually in story points per sprint, or stories per sprints, and sets a target, story points and stories will magically shrink. As a result, velocity will go up, and the target is met, without anything real changing at all.

If we look to research on target setting, it corroborates the stories above.

Here we found that participants who were given a specific performance goal for how much revenue they had to earn were almost twice as willing to use unethical methods to achieve it than those given a vague goal, irrespective of their moral justification.
— When Tough Performance Goals Lead to Cheating, by Colm Healy and Karen Niven

The above quote is from an HBR article about a scientific study with 106 participants.

We found that people with unmet goals were more likely to engage in unethical behavior than people attempting to do their best. This relationship held for goals both with and without economic incentives. We also found that the relationship between goal setting and unethical behavior was particularly strong when people fell just short of reaching their goals.
— Goal Setting as a Motivator of Unethical Behavior, June 2004, The Academy of Management Journal, by Maurice E Schweitzer (University of Pennsylvania), Lisa D. Ordóñez (The University of Arizona), and Bambi Douma (University of Montana)

Here is a quote from a third study:

Our study finds that cheating in goal-based systems occurs due to financial rewards to cheating and to social comparison framing. Assigning a goal on its own without increased pay or social comparison framing did not lead to an increase in cheating relative to a do-your-best counterfactual. However, the use of either financial rewards or social comparison framing led to cheating.
— Why do goal-based incentives cause cheating? Unpacking the confounding effects of goals, social comparisons and pay, by Matthew Chao (Williams College), and Ian Larkin (UCLA Anderson School of Management)

We do need a lot more research about the effects of setting goals and targets. As far as I can tell, setting goals, by itself, does not cause significant problems. However, when we offer financial rewards, or compare people’s performance, then we invite cheating.

If a target is tied to a reward, then people will be prone to cheat to get the reward. The same way, if a target is used for social comparison, as in “Helen reached the target, and you didn’t”, then we also invite cheating.

I haven’t found any scientific research about targets tied to punishment, but since we are geared to react even more strongly to punishment than to rewards, it is highly likely that fear of punishment makes us cheat too.

Goals blinds us to creative solutions!

Goals do make us more focused. When we do something relatively simple, that only requires increased effort, that can indeed be effective. However, our working lives are often way more complex than that.

With goals, people narrow their focus. This intense focus can blind people to important issues that appear unrelated to their goal (as in the case of Ford employees who overlooked safety testing to rush the Pinto to market). The tendency to focus too narrowly on goals is compounded when managers chart the wrong course by setting the wrong goal…
— Goals Gone Wild: The Systematic Side Effects of Over-Prescribing Goal Setting, by Lisa D. Ordóñez Maurice E. Schweitzer Adam D. Galinsky Max H. Bazerman

Case studies and anecdotes are no substitute for scientific research, but if you got the research, they can be used to illustrate a point, so let’s do that:

Several years ago, I worked directly for the CTO of a company that had hired a sub-contractor to improve its IT infrastructure. The project was not going well, and failure would mean the sub-contractor would not get a new contract, and that would mean firing a lot of personnel. This was during a raging global financial crisis, and if there was one thing you did not want to do, it was lose your job.

The sub-contractor management set as a goal to minimize the damage to their company from missing the contract. Everything they did was focused on whom to fire, when to fire, what severance packages should look like, and other damage control measures.

I asked my CTO if I could get a shot at helping the sub-contractor make the project deadline, and he okayed it. There were benefits for both organizations if the sub-contractor made the deadline, and, of course, to the people who would keep their jobs.

The sub-contractor was less than enthusiastic about having me meddle in what everyone knew was a hopeless situation, but the CTO I worked for could be very persuasive when he wanted to be, so I got permission to make an attempt.

The development team was already working as hard as it could. There simply wasn’t enough hours in a day to add more overtime.

I began with an information gathering and brainstorming workshop using a method called Crawford Slip. Then, I showed the team how to set up a kanban board. They were less common back then than they are today. I would have preferred to stay with the team throughout the improvement project, but that proved impossible. Nobody wanted to spring for the cost of having me around full time.

Instead, I told the team to photograph the kanban board twice a week, and email me the photos. I spent several weeks doing nothing but recording the team velocity, and using the data from the brainstorming session to build a Current Reality Tree (CRT). The CRT is a problem solving tool from the Logical Thinking Process toolkit.

Then, I traveled back to the team, showed the project manager what I had found, and made a few suggestions. You can see the results in the graph below.

I met with the project manager week 40. Week 41, productivity went up. Week 42, it went up again. After that, it held steady around a new baseline. In two weeks, the average output increased by 316%.

Remember, they were already working as much overtime as they could, so the productivity did not come from working harder. It came from working smarter.

What does it mean to work smarter?

So, what does it mean to work smarter? You need to understand the process, find problem areas, and then fix them. The picture above shows the situation before we made changes. The developers worked on a bunch of jobs, all with high priority, and all broken down into several tasks. Since everything had high priority, they tried to start each job as soon as possible.

When they had finished one task, they immediately began working on another. The problem was that the new task they started working on, had nothing to do with the preceding task. They switched back and forth between working on different jobs. This lead to tasks belonging to different jobs getting interleaved, and this pushed lead times to become longer.

What we did was reorganizing the tasks, so that all tasks belonging to the highest priority job were finished first. Then all the tasks belonging to the second highest job were done, and so on.

One counter-intuitive thing is that we delayed starting work, in order to finish sooner.

In reality, the benefits are much larger than the diagram shows, because when you eliminate multitasking, you eliminate having to spend time on switching tasks. You do not need to change your tool setup as often, you do not need to backtrack to try to remember what you did the last time, and you do not need to figure out what the heck you were planning to do. That adds up to a lot of time saved.

Of course, eliminating multi-tasking is only one of the things you can do. For practical purposes, there are nearly always dozens, or hundreds, of improvements you can do. Once you have made an improvement, that often opens up the way for other improvements.

With OKRs, there would have been no improvement, because working harder was not the answer. Understanding the process, and eliminating a major problem, that did the trick.

It’s worth noting that a change like this is a sustainable change. No one is pushed to work harder. The developers can actually deliver more value while working less. That leaves more time for thinking, which can be used to find new things to improve.

How did things turn out? Well, the company managed to deliver, it got the contract it was after, and when I checked a year later, they had managed to keep all the personnel.

Had they used OKRs to just push to work harder, people, they would have had close to zero chance of finishing the work in time, and getting the contract.

Summing up

It’s time to sum this article up:

There is very little research on the efficacy of OKRs. The existing material consists mostly of testimonies and case studies

Testimonies and case studies are notoriously unreliable. We should not base our beliefs about OKRs, or anything else, on them.

While we cannot use scientific research to tell whether OKRs work, there is research on the building blocks of OKRs.

Agressive target setting: Setting agressive targets increases cheating substantially
Use of single data points: Using single data points can be, and very often is, incredibly misleading. Metrics systems should, by default, use long data series.

OKRs ignore the effects of variation, and that is a great way to trick people into making bad decisions. We need some way of separating random variation within the system from special cause variation. We also need to detect trends in the data. Process Behavior Charts does all of this, but there are other methods that can be used.
OKRs do not tell us why we reach a target, or why we fail. It may be that we have improved something, or it may be that we got lucky, or unlucky.
OKRs do not tell us whether a change, if there is one, is sustainable, or if it will cause detrimental side effects.
OKRs present dependent variables as if they are independent. This is bound to cause confusion, and bad decisions. To be useful, a metrics system must link things together, so that we do not inadvertently cause big harm when we make a small improvement.
OKRs makes us focus on reaching targets, and this can make us blind to creative solutions. In other words, OKRs can prevent you from reaching the goals set by the OKRs.

There you have it: There is very little that speaks in favor of OKRs, and a lot of reasons to be suspicious.

In addition, we can easily create metrics systems that do not have the problems that OKRs have.

You can use Goal Maps, S&T trees, or strategy maps, to build a model of the organizations goals, and how they interact with each other. (I think Goal Maps and S&T trees work better than Strategy Maps for this, but that is a matter open to debate. There are also other ways to do the mapping.)

Add using Process Behavior Charts, or some other way to distinguish between systems noise and special cause changes, and you have a good start.

Of course, any metrics system has limitations, so you should be aware that in some situations, cause and effects based metrics systems cease to function. That happens when your organization dives into complexity. You don’t have cause and effect anymore, but you still have statistical dispositions, which means that while a Goal Map is less trustworthy, it is probably not totally useless. When you have true chaos, which happens only rarely, you have no causality, so you need to use other methods. (Check out Cynefin, if you are interested.)

Most of the time though, a good metrics system will help, but OKRs probably won’t.

Search This Blog

Kallokain