Saturday, January 14, 2023

Are You Still Using the Wrong Control Levers in your Agile projects? Part 2: Business Value, it's Use and Abuse

In the first part of this article series, I wrote about how using Cost and Capacity to control an agile software project can trap an organization in a hire and fire cycle that increases project duration and cost.

This time, we will take a closer look at an agile control lever that works very well, except when it doesn’t: Business Value.

When used right, the Business Value lever can be the most powerful tool you have to steer an agile project or program towards success. When used wrong, and it often is, the Business Value lever can be completely disabled, leaving management to pull a lever that no longer works, and no longer has the ability to steer the project.

How to Deliver Business Value the Agile Way

Deliver working software frequently, from a
couple of weeks to a couple of months, with a
preference to the shorter timescale.
— Principles behind the Agile Manifesto,

Let’s start by looking at how the Business Value lever is supposed to work:

Most of the original agile methods, often called lightweight methods, were designed for small systems development. There was usually a single team, and that team built something that could be sliced up in many small deliveries that had value to a customer. For example, a website, or a payroll system, can be built and delivered with minimal functionality. After that, new functionality can be delivered in small increments at relatively short intervals.

The illustration shows how many small, fast deliveries begin to generate business value early, giving agile projects a head start on waterfall projects. In small systems development, small deliveries are deliveries to end users.

In order to deliver in small increments, you need a way of slicing up the application you are building. Thus, the User Story was born! A User Story is a short, non-technical description of something a user wants to do, written from the user’s perspective. The User Story was not the only way, nor the first, to describe vertical slices, little pieces of functionality that worked end-to-end, but it became the most popular.

Originally, User Stories were written on index cards, by an end user, in a freeform format. That worked very well. Some User Stories had a lot of value to end users, others had less. What you did to maximize business value was to sort the index cards from most valuable to least valuable, and start working from the top of the stack of cards.

User Stories provided an alternative to Critical Path for organizing the work. If you read the first article in this series, you may remember that the Critical Path is defined as the longest stretch of dependent activities in a project. Critical Path made it possible, when it worked, to minimize the total duration and cost of a project. User Stories, and vertical slicing, made it possible to deliver value much sooner.

With Critical Path and Waterfall methods, if you had a twelve month project, you would not see any business value from the project until after 12 months. With an agile method, you would get something of value after one or two months, and then, with each delivery, the business value would increase.

Getting business value sooner meant making money sooner. If you have ever played around with a Profit & Loss statement, you know that shipping working software just a little bit faster can have enormous impact on the total business value over the product life cycle.

Critical Path became obsolete, replaced by User Stories, and the idea of shipping a minimal product as quickly as possible, followed by incremental increases in functionality.

The Decline and Fall of the User Story

…user stories are based on real customers who will use the product once it’s released. As such, the Product Owner may choose to talk to potential customers in order to get a feel for what they want. There could be focus group discussions, interviews and any other kind of research needed in order to gain the intelligence needed to create a viable user story.
— Cohn, Jeff. Scrum Fundamentals: A Beginner’s Guide to Mastery of The Scrum Project Management Methodology

In the beginning User Stories worked very well, but then something happened: Scrum became the dominant agile method! Scrum did away with the idea that an end user should write the User Stories, and inserted an intermediary, the Product Owner.

Note the phrase “the Product Owner may choose to talk to potential customers” in the quote above. The Product Owner should talk to end users, but isn’t required to. In many large organizations, it’s difficult for a Product Owner to even find an end user.

The illustration shows the difference between Scrum and most other agile methods when gathering requirements. Even in the best case, Scrum inserts an extra degree of separation.

With most original agile methods, like eXtreme Programming (Kent Beck), Crystal (Alistair Cockburn), and Lean Software Development (Mary and Tom Poppendieck), it was explicit that the organization would have to change in order to support agile teams. One such change would be tearing down organizational barriers, so the development team could meet and talk to users directly.

The illustration shows the Crystal approach: An expert user, a business expert, and a designer/developer collaborate creating lightweight use cases. The designer/developer is part of the development team. This maintains a single degree of separation, while adding a broad business perspective, a deep understanding of what users do, and how they work, while also capturing more initial information and being scalable to larger, multi-team projects.

It is worth noting that some agile methods, notably Crystal, took a more scalable approach to requirements management. Crystal used lightweight use cases instead of user stories. Use cases also represent vertical slices of functionality. However, they capture a lot more information than user stories, while still being reasonably quick and easy to create. The use cases were created by a troika, an expert user, a business expert, and a designer/developer. Crystal was explicit about the high degree of expertise needed to create good requirements.

Scrum took a different path. Instead of changing the organization, Scrum inserted the Product Owner as an intermediary, isolating the development team from the organization around it. This was attractive to management in many organizations, because they did not have to change how they worked. Instead, the team could be treated more or less like a black box.

Isolating development teams from users came at a cost:

Loss of information!

In some organizations, the Product Owner works in close contact with the users. In those organizations, inserting an extra degree of separation might not matter much. In other organizations, where there is less contact between Product Owner and end users, the information loss may be substantial, and significantly reduce business value.

Note that User Stories are not complete requirements. Originally, they were intended as placeholders for conversations between developers and end users. With Scrum, developers will have those conversations with the Product Owner instead. For this to work well, the Product Owner must have both deep and broad knowledge of the end user’s area of expertise! However, the Official Scrum Guide does not require Product Owners to have any such expertise. Instead, Scrum focused on accountability:

The Product Owner is accountable for maximizing the value of the product resulting from the work of the Scrum Team. How this is done may vary widely across organizations, Scrum Teams, and individuals.
— The Scrum Guide, p. 5,

The Scrum Guide goes into more detail about the areas of accountability, but is silent on the corresponding skills needed to do the job well. This focus on areas of accountability in lieu of skills is not limited to the Product Owner role. All other roles in Scrum are defined the same way.

Eliminating skill requirements contributed to making Scrum an easy sell. Unfortunately, the lack of skill requirements lead to an overall reduction in skill levels for all Scrum roles, compared to other agile methods.

Why would user stories being vertically sliced matter?
— Agile Way of Working responsible in a major SAFe program

Many, perhaps most, Product Owners did not get much training in capturing requirements. Nor did they learn why vertical slicing was important. At the same time, they had to do a difficult job. Instead of an end user writing a story, a Product Owner had to imagine what an end user would write, and then write that. To make it worse, the Product Owner often had to do it without expertise in the subject matter, and without expertise in the relevant business processes. The Product Owner job became very difficult indeed, and that could not help but reduce the quality of User Stories.

What happened was that Product Owners in many companies wrote task descriptions instead, because that was what they knew how to do. They still called them stories though.

The problem with this is that we do not have anything that structures the tasks into vertical slices of functionality anymore. Product Owners were supposed to organize the work order from the item with the highest business value to the item with the lowest, but tasks don’t have business value!

Business value is an emergent property of a vertical slice of functionality, or a set of vertical slices of functionality! That rarely stopped anyone from trying to do the impossible though. I have occasionally met Product Owners, and other people, who suspect that something fishy is going on with how we write requirements, but they are few and far between.

So, why does it matter whether a team uses User Stories or tasks?

It matters, because when the teams work with tasks, they do not usually complete all tasks in a vertical slice in one go. They may work on a front end task, and then move on to another front end task, or work on a business rule, and then move on to another business rule, without connecting anything end-to-end.

We have now disabled the Business Value control lever!

Whenever I see this phenomenon, teams still do deliveries, and still do demos. It’s just that they do internal deliveries, and the demos are Powerpoint presentations, or demos of some internal function of interest only to developers. End users cannot use the software in the internal delivery. End users are never invited to these demos.

Companies that do this can be even worse off than they were before agile, because they first got rid of the Critical Path based project management they had before, and then disabled the vertical slicing which was supposed to replace Critical Path management.

Yeah. And don't try to tell us there is no way to go but up because the truth is, there is always more down.
— Gunn, in the Angel TV-series episode Happy Anniversary

Confusing user stories with tasks causes big enough problems when doing small systems development. What happens when you scale up to large projects with many teams?

When we scale up, and teams become responsible for functional units in the system architecture, they also become dependent on each other. The teams have the same dependencies the system architecture has.

If front facing teams worked with real User Stories, then we could break those stories down into tasks that supporting teams worked on, and continue producing vertical slices of software.

We could also design the system architecture so that each team can handle a vertical slice of functionality.

In most cases, we would have to balance the two approaches.

What I have seen happening though, is that everyone keeps working on tasks. Architectural dependencies are not resolved. Team topology is ignored. Work consequently slows down, a lot!

I worked in one large project some years ago where deliveries were slowed down by a factor of 50! In other words, what should have been a two week user story took two years to implement instead.

Most other projects I have seen have had similar problems.

Mirror, Mirror on the Wall, Tell Me Which is the Most Fake User Story of Them All

As far as I know, no method has done more to confuse the issue of user stories versus tasks than the Scaled Agile Framework (SAFe).

SAFe has a requirements model with two kinds of stories: User Stories, and enabler stories.

User stories are, according to SAFe, vertical slices of functionality, but SAFe does not mention why that is important, or what happens to SAFe’s own requirements prioritization system if they are not. Then SAFe tops it off by introducing enabler stories, which are a kind of tasks.

Hierarchically, both user stories and enabler stories are the children of features. There are two kinds of features, business features, which are vertical slices of functionality, and enabler features which are a kind of tasks created by architects. Both kinds of features may contain either kind of story.

Moving up the hierarchy to business capabilities and enabler capabilities, both of them may contain both business and enabler features.

The top level is the epic. There are three main types, portfolio epic, solution epic, and program epic. Any type of epic may be of one of two subtypes, business epic and enabler epic, which gives you a total of six different kinds of epic.

Any kind of epic may contain any kind of capability, and any kind of feature. Any of these work items may be standalone. You may, for example, have enabler stories and enabler features who are standalone and not connected to anything that has any business value.

It’s a mess! I don’t think anyone has a grip on it. I may be wrong, but I haven’t met anyone who has figured it out yet.

What’s Bad for the Geese, May Be Good for the Gander

All of the confusion about agile requirements is bad for the industry as a whole, but it is actually an opportunity for those who manage to fix their user stories. When user stories describe vertical slices of functionality, the Business Value lever starts working, and that means the basic mechanism for making small, fast deliveries is working.

That, in turn, makes it possible to use other control levers more effectively.

I’ll describe some of those control levers in Part 3: The Iron Triangle vs. The Gang of Four.

Monday, January 09, 2023

Are You Still Using the Wrong Control Levers in your Agile projects? Part One: Cost and Capacity - The Levers of Death

Which levers should you use? When should you use them? Which levers should you avoid using? There is a subtle hint in the illustration.

Agile methods brought us new ways of developing software, and new ways of managing software projects, programs, and product development. Unfortunately, I have seen very few, if any, organizations that make good use of the powerful new management tools they have at their disposal. Instead, they continue to use the same tools they used before agile, often with predictably bad results.

In this series of articles I’ll provide a walk through of high level controls, their pros, cons, and how they relate to each other.

The Levers of Death: Capacity and Cost

Let’s start with the Levers of Death, Capacity and Cost. These levers are the ones I see used most often. They are not necessarily bad in and of themselves (well, firing people is bad), but they are easy to misuse, and often poorly understood.

In most organizations I have worked in, it is assumed Capacity and Cost controls are rather straightforward:

Capacity - Increase capacity, i.e. hire more people, when you want a project to speed up. This of course also increases cost.

Cost - Cut Cost, i.e. fire people, when the project gets to expensive. What is too expensive or not, is usually measured against a predetermined budget. Usually a yearly budget. Cutting cost is expected to reduce capacity, but the implications of that are often quietly ignored, because, what else can you do, right?

As we will see, the problems with the hiring and firing approach outweigh the benefits. Fortunately, there are better alternatives, controls that both provide more effective economic control, and are more humane. I’ll explore both bad and good control levers in this article series, but we will start with two baddies: Cost and Capacity.

Problem 1: Vital Information is Missing

The critical path (or paths) is the longest path (in time) from Start to Finish; it indicates the minimum time necessary to complete the entire project.

— The ABCs of the Critical Path Method, by F. K. Levy, G. L. Thompson, and J. D. Wiest, Harvard Business Journal, September 1963

When agile methods became popular, they were intended to replace older methods of management. That lead to chucking older practices overboard, because they were not needed anymore. Unfortunately, the capacity and cost levers were supposed to be, if not chucked overboard altogether, at least relegated to third rate status, but managers in many companies held on to them as primary project controls.

To make it worse, when organizations use them now, they do it without benefit of information they had 25 years ago. The reason is that some of the decision support needed to use Capacity and Cost effectively, actually was chucked overboard.

One such missing piece of information is the Critical Path. The critical path is the longest path, in time, from start to finish in a project. The critical path is extremely important in old style waterfall projects, because it determines the duration of the entire project.

If you know the critical path, then you also know that along the critical path you have a capacity bottleneck.

When you wanted to add capacity, the trick was to locate the capacity bottleneck, and add capacity there, and nowhere else. This will sound very familiar to anyone who uses the Theory of Constraints (TOC) management paradigm, and the TOC project management method, Critical Chain.

Conversely, if you wanted to reduce cost, you made very sure to reduce it in places that were not the bottleneck, and preferably not on the critical path.

“Dr. Livingstone, I presume?”

Before we look closer at why we need to know the critical path and the project bottleneck before we should even think about hiring and firing, we need to look at a problem the critical path idea was not designed to handle:

In projects with a lot of variation, like software, and product, development, the critical path, and the main bottleneck moves around, a lot!

The reason for the critical path and the project bottleneck moving around, is simple: Random variation!

When you build something new, which is what software development is all about, you do not have well defined lists of activities. Instead, you are doing exploratory work, with very limited ability to predict the future. To make it worse, the more detailed your predictions, the more wrong they will be.

Imagine, for a moment, that you are Henry Livingstone, on March 21, 1871. It’s the first day of your attempt to find the explorer David Livingstone, who had vanished in central Africa, several years later. You have prepared carefully for the rescue expedition, but it would be folly to make a predetermined plan of exactly where to go to find Stanley, and exactly how long it will take.

Livingstone’s expedition faced enormous dangers, or impediments, as Livingstone would have called them if he had been a Scrummaster: Crocodiles ate the pack animals, tse-tse flies gave them deadly diseases. Dozens of porters abandoned the expeditions, or died from dysentery, smallpox, malaria, and other diseases. Livingstone had been spotted near Lake Tanganyika, so Stanley had a general idea of where to go, but he had to pick up more along the way. He heard a rumor about a white man in the town Ujiji, and went there, not knowing whether he would find Livingstone, or not. By luck, he did!

Software development is like that. You can’t make detailed plans and schedules, but you can prepare.

Unfortunately, the whole critical-path-and-bottleneck idea requires that you can plan and schedule with a great deal of accuracy. If you can’t plan in detail, you can’t identify the critical path. If you can’t identify the critical path, it’s difficult to identify the bottleneck in the process. If the bottleneck, and the critical path, keeps moving around, a good decision about where to hire and fire today, will become a bad decision tomorrow.

There are things you can do to mitigate the problems with critical path, but I’ll leave that for another article series. In the more than 40 years I have worked in software development, I have yet to see a software project implement anything close to a useful solution.

Today, when companies have scrapped the whole idea of critical path management, fixing the problems with it, is of little relevance.

Instead, we will look at what happens when an organization uses the Capacity and Cost levers without knowing what the critical path and the project bottleneck is.

Problem 2: Adding Capacity Adds Work-In-Process

When you do not know where the critical path bottleneck is, and you add more people to a project, you are more likely to add people in other locations than the bottleneck, than to actually hit the bottleneck itself. That means most of the people you add won’t contribute to speeding up the project. Instead, they will add to Work-In-Process (WIP), queues of unfinished work in the process.

The larger the queues you have, the larger your lead times will be. Unfinished work in queues also add risk, because you won’t know whether the stuff will actually work with all the other stuff you build until you test it end-to-end. There are plenty of techniques for mitigating that risk, but you can’t eliminate it. Besides, most companies I have worked with are rather poor at this kind of risk mitigation.

Build up enough WIP, and your critical path will shift to the path where most of the added WIP is, which will increase project duration.

Thus, adding people won’t buy you the added capacity you think it will.

Communications Overhead

The Illustration shows how adding more nodes to a network, i.e. adding team members to a team, or adding teams to a project, causes quadratic growth in communications overhead.

On top of the WIP problem, adding more people will add communications overhead. The communications overhead can start out small for a small project, but it will grow quadratically while you add people linearly. This means when you add more people, productivity per person will go down.

I have 30 people in the project. The problem is I need only 5 people.

— Project manager, in a project I worked in around 2005

Worst case, you can actually reduce capacity when you add people! I have worked in 200 people projects that could have moved a lot faster if there had been only 20 people in the project.

The short of it is that adding more people will add cost, that we know for certain, but whether it will actually shorten project duration is a bit hit and miss. The larger your project, the higher the probability it will be a miss.

Problem 3: Cutting Cost Reduces Bottleneck Capacity

When management, often belatedly[*], discover that adding people added a lot of cost, but did not shorten project duration as much as they had hoped, or at all, the natural reaction is to cut costs, in order to make the budget targets.

Unfortunately, if you have 100 people in your project, add 100 more, and then cut 100, you won’t be back where you started. You will be worse off than before!

Why is that? Because we do not know where the bottleneck is, and because the bottleneck often jumps around, it is highly likely that cost cuts affect the bottleneck, either permanently, or intermittently. When that happens, the entire project is delayed.

The illustration shows how reducing capacity at a bottleneck can have much greater effect on duration and cost than expected.

Here is an example from a project I worked in:

Several years ago I was the Scrummaster for a development team where higher level management from time to time pulled one person from the team to work outside the team.

The team had 7 members, so both the team and management expected the remaining capacity to be 86% (6/7 = 0.857 ≈ 86%). However, this is true only if all team members are full stack developers, and if they all are equally productive.

The team had 5 developers and 2 testers. The testers were the bottleneck. Unfortunately, the person pulled from the team was one of the testers. That reduced the total team capacity to 50% (1/2 = 0.5 ≈ 50%).

If that team had been the bottleneck in the entire software development project removing the tester for 1 week would mean adding 1 week to the duration of the entire project. That also adds 1 week of cost for the entire project, way more money than management had expected.

Note that if you have, for example, a large SAFe program, with several Agile Release Trains (ART), and 7-10 teams in each ART, you could double the duration of the project by firing that single tester…unless you figure the problem out and hire a new tester. If you do that, and the critical path and the bottleneck then jumps to somewhere else, then the new hire will just contribute to creating more WIP, and you are back to Problem 2: Adding Capacity Adds Work-In-Process.

The point is that cutting just a few people from a project may have a disproportionate effect on project duration and cost, and you do not know where it is safe to cut cost! Sometimes it works, sometimes it makes the situation worse.

Problem 5: The Hire and Fire Death Spiral

Using the Capacity and Cost levers can easily drag a software development project into a kind of economic death spiral:

It starts with WIP going up due to statistical variation. More WIP means work will have to wait in queues, which means delays, which means cycle times go up for many teams. This means the project is delayed.

Because of the delays, deadlines are broken. Management tries to fix this by adding more people. This increases communications overhead. It also adds capacity off the critical path, which leads to more WIP accumulating. The net result is that the project does not speed up as much as expected, but it now costs more.

It is common that management keeps trying to speed the project up by adding even more people. This is less and less effective each time. This is partly because the communications overhead goes up quadratically when people are added. Another part is that the larger the project is, the higher the probability of missing the critical path altogether when adding new people.

Eventually management will notice that not only does the project not move forwards as expected, it also burns through money at an alarming rate. That is when the cost cuts come. Some of those cost cuts are likely to hit the critical path. When that happens, project duration goes up. Cost per day does go down, but the increased duration means there are many more days, so total cost goes up.

Cost cuts continue until management notices that we now have even more delays, so management starts hiring more people, and the Hire and Fire cycle starts over again.

The whole thing continues until the project either stumbles over the finishing line, or the organization gives up and pulls the plug on the whole mess.

Very, very rarely, management stops, decides the whole depressing cycle is daft, and decides to find a better way. When that happens, management often goes for the sweet promise of increased Business Value.

Next: Part 2: Business Value, its Use and Abuse.

[*] Agile methods have a built-in early warning system, monitoring queues. Unfortunately, organizations that rely on the Capacity and Cost levers usually do not use queue monitoring, at least not very well.

Monday, November 01, 2021

OKRs - not as Great as You Think!

Recently, I made a short Linkedin post that was quite critical of OKRs, Objectives and Key Results. The post got a few likes, and comments who both agreed and disagreed with me. I found one of the comments that disagreed with me particularly interesting, because it referenced an article that was very positive to OKRs.

That article contained pretty much everything that makes me doubt that OKRs are useful. I’ll walk through it, but first, here is my original Linkedin post:

Here is why I do not trust OKRs, in three quotes:

"When a measure becomes a target, it ceases to be a good measure."

-Goodhart's Law,

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

-Campbell's Law,

"Key Results benchmark and monitor how we get to the objective. Effective KRs are specific and time-bound, aggressive yet realistic. Most of all, they are measurable and verifiable. You either meet a key result’s requirements or you don’t; there is no gray area, no room for doubt."

-What is an OKR? Definition and examples,

In other words, the OKR system is designed to implement the kind of target setting Goodwin and Campbell warn about.

Judging from what I have seen, OKRs are a great way to train people to fudge results.

Add in that OKRs oversimplify the dependencies between different objectives, and that OKR data is usually presented with to few data points to be useful in decision making, and I have a difficult time understanding the enthusiasm.

I do like metrics, but I doubt that successful companies that use OKRs became successful because they used OKRs.

#strategy #management #leadership

Let’s look at the article used as evidence that OKRs work, and see whether it holds up. The article is named Implementing OKRs. A tale from the trenches.

The case of the misleading case

First of all, the article describes a single case of a company that implemented OKRs, set ambitious targets, and then saw the company meet those targets. That is problematic, because one, or even a few, cases do not prove, or disprove anything.

Look at the picture below.

Which of the three cases in the picture is closest to the truth? The author of the article clearly believes it is Case 1, but there is no evidence of that. The truth may be closer to Case 2, or Case 3.

Selection bias and lack of scientific research

When in doubt, turn to science! I googled “okr scientific research”, and a few other sets of search terms, but found no statistical studies, or metastudies.

The closes I got, was a question on Quora:

“Is there any peer-reviewed research showing that the Objectives and Key Results (OKR) model actually improves efficiency, productivity, or overall company performance?”

Unfortunately, because of the selection bias inherent in the way the question was phrased any answers won’t be much use. Studies that do not show that OKRs work, will never be included in the answers. What you need is studies that prefer statistically significant results. Preferably, we should have metastudies, because they produce better results than any single study can.

As far as I can tell, the research on OKRs is so inadequate that we cannot turn to direct scientific research to either prove or disprove whether it works.

People burning bright, but not for long…

We can, however, use indirect methods to evaluate whether OKRs are likely to work, or not. You create an OKR by setting an agressive, relatively short term, goal, and a target value to reach.

The article author did this by setting nine BHAGs, Big, Hairy Audicious Goals, figured out what reasonable targets would be, and then doubled them. In the words of the article author:

We took those numbers and we doubled them. Why? Because a) why not? That way if we aimed for twice the volume of our plan and we missed by half, we’d still hit our plan. If we actually hit those goals, even better! And b) because you are only supposed to get 70% on your OKRs. These are stretch goals, so come on, stretch a bit. Yeaaaah, that’s it.

The article then continues, describing how quarterly OKRs were set, followed by individual OKRs. The author also asked advice about OKRs on Facebook. That probably yields as reliable advice as asking about Covid-19 advice on Facebook.

Did it work? The article author is convinced it did:

When Brew and I traveled on business, shit got done! Decisions were made! Targets were hit!

He also has a credible sounding explanation for why the targets were hit:

Once you see what other people are signing up for, and you see the plans they are putting in place to achieve them, you get religion. You believe they are going to hit their goals. And you don’t want to be the only one not hitting her goals. So you work. You work hard. Instead of watching what everyone else is doing and reacting, you start focusing much more on achieving your Objectives and just assume they’ll hit theirs. OKRs help everyone focus, and that drives results.

The author does not specify exactly what OKRs they used, but he does provide three examples of OKRs he believes are good ones:

  • Increase leads from 300 to 700
  • Improve conversion funnel efficiency by 30%
  • Increase sales volume by 70%

Sounds pretty clear cut: OKRs are a recipe for success…or is there something wrong with this picture?

Yes, there is something wrong! For one thing, the author claims success comes from everyone working harder. Note that working harder and working smarter are two very different things.

Leads increase from 300 to 700. That is an increase of 133%. If that increase in results came from working harder, it is reasonable to assume there was a corresponding increase in effort. If people worked 40 hour weeks before OKRs, with a 133% increase, they would now work more than 93 hours per week.

Did they work 93 hours per week? We do not know, because there was no OKR for that, but it is quite possible that the OKR system pushed people to work way more than is sustainable. What happened to personnel turnover? The article did not mention that either, so we do not know.

Good metrics are not isolated from each other. They are linked into a system. Such a system would have metrics that prevent people from being overworked. A good metrics system should also be designed to be self-correcting. You have both leading and trailing metrics. You use the leading metrics to guide behavior changes, and trailing metrics to see if you actually got the results you thought you’d get.

OKRs do none of that. OKRs are treated as independent. That is why, with OKRs, it is easy to turn a workplace into a 93 work-hour per week sweatshop.

On the other hand, we do not know that people worked more hours. There might be some other explanation.

I do not have data from the article author’s company, but I do have data from the project teams I lead, so let’s use some of my project data, and see where it gets us. First let’s compare productivity data for two periods, similar to what you would get if you used OKRs. Let’s go:

  • Period 21: 4 value units
  • Period 22: 47 value units

Wow! From 4 units to 47 units! That is an increase of 1075%! How audacious target setters we must have been! What a Big Hairy Audacious Goal we must have had!

Lets look at the productivity the quarter after that:

  • Period 23: 19 value units

That is a drop of 60%. How could the geniuses of the previous quarter turn into such bumbling idiots? Off with their heads!

Well, that last thing might be overreacting a bit. Let’s look at why we have such huge increases and decreases in results.

The picture above is a Process Behavior Chart, also known as a Process Control Chart. It shows that although we have a lot of variation in productivity, the variation always stays within the upper and lower process limits. These limits are calculated using a statistical method. I won’t go into details in this article. The important thing is that the system is stable within the process limits. All the variation is random.

With OKRs, this randomness cannot be distinguished from actual change, so it is entirely possible that the 133% increase in leads the OKR article author has in his example, is entirely due to random variation.

I don’t know, and neither does he! The difference is, Process Behavior Charts make it possible to distinguish real change from random variation. OKRs do not do that. Even worse, OKRs lead you to believe you understand a situation, and that you control it, while you really don’t.

Targets corrupt!

What happens when you set a target? Lets look at the same Process behavior Chart, but this time I have marked some target levels. I have also color graded the area from zero to the upper process limit, in colors from green to red. The more green, the easier it is to make a target. The more red, the more difficult it is to reach a target.

It is important to remember, that we are looking at the system as it actually is. To get the system to behave differently, we need to change it in some way. If we set targets and empower people, but do not teach how to change the system, they have only a few alternatives:

  • Hope they get lucky. Works once in awhile, but the game tends to be stacked against the people doing the actual work, especially since OKRs mandates setting targets without providing any method of figuring out what a reasonable target is.
  • Work more hours. Common, and often works in a very short perspective, but not sustainable over time. Can have disastrous side effects, like increasing personnel turnover, which drains the organization of competence, and destroys organizational memory.
  • Cheat. Usually the safest option. Risk of discovery is small. Even if discovered, risk of punishment is not that high.
  • Teach themselves how to change the system, change the system, and then pray that they really were as empowered as they were told they were. If they were not, punishment for doing actual change is usually swift and unmerciful. This is difficult, time consuming, dangerous, and often provides little benefit to the people who do it, even if the change succeeds.

There are 25 data points in the series. If we want a 70% chance of success, that is we want 17 data points at the target level or higher, we should set the target at around 5. That is well below the average, which is slightly above 20.

Setting the target to a 70% success rate based on future performance would require a time machine. If you don’t have one, trying will just be silly.

Let’s assume for a minute that you know the average productivity is 20. Note that with OKRs, you don’t bother with averages, so you need something else to provide that information. If you set the target at double the average, 40, you have a 20% chance of reaching it, even if you do not change anything about how you work. I’d say doubling the average sounds like an aggressive target.

That means, in our example, 20% of the times you set a target that is double the average, you will trick yourself into believing that OKRs work. People like writing about their successes much more than they like writing about their failures, so the people who write about OKRs are likely to come from a group that sets target like this, or, since they usually do not know what the average is, have set the targets way into the green.

It is a popular belief that if you empower the people who do the work, they will start doing the systemic improvements necessary to meet audacious targets. Unfortunately, that requires that they also know how to improve the system. Sometimes people do, but most of the time, they don’t. Why not? One reason that stands out is that neither schools, nor workplaces, teach effective methods for improving systems. Universities sometimes do, 

Plenty of such methods exist. There is a plethora of different improvement methods based on complexity thinking, systems thinking, and statistics. Some of these methods have tools that are fairly easy to use, like Five Why in Lean/TPS, or Evaporating Clouds in The Logical Thinking Process (which is one of the tool sets in the Theory Of Constraints). Others, like Process Behavior Charts, are more difficult to use. Easy or hard, they all have one thing in common:

They are rarely taught!

Imagine that someone sets a target for you, and you have no idea how to achieve it. Imagine that whoever sets the target also tells you “you are empowered to do whatever it takes to reach the target, just don’t fail”.

If the target looks possible to achieve, you will almost certainly continue the way you are used to, and hope for the best.

If the target is way out there, and looks impossible to achieve, there is only one thing to do:

You cheat!

Some years ago, I worked for a company that set very challenging targets for its sales people, in several different markets. I was in close contact with people from one of the markets, and they told me how they had to cheat to meet their targets. They hated it! It’s just that they had to. They told me all the other markets were in a similar position, and did the same thing.

In their case, they did not manipulate the total number of units sold, but they did mess around with the dates when things were sold, and that was enough to ensure that they were on target often enough to avoid getting punished for missing them.

In software development, which I usually work with, management likes to set targets like “80% automated test coverage” for an application. That is particularly useless, because the need for tests varies between different parts of an application. In general, it is useless to write tests for something called accessor methods, but there are a lot of accessor methods, and they are easy to write automated tests for, so that is what often happens.

Another little trick: If management measures software development team velocity, usually in story points per sprint, or stories per sprints, and sets a target, story points and stories will magically shrink. As a result, velocity will go up, and the target is met, without anything real changing at all.

If we look to research on target setting, it corroborates the stories above.

Here we found that participants who were given a specific performance goal for how much revenue they had to earn were almost twice as willing to use unethical methods to achieve it than those given a vague goal, irrespective of their moral justification.

— When Tough Performance Goals Lead to Cheating, by Colm Healy and Karen Niven

The above quote is from an HBR article about a scientific study with 106 participants.

We found that people with unmet goals were more likely to engage in unethical behavior than people attempting to do their best. This relationship held for goals both with and without economic incentives. We also found that the relationship between goal setting and unethical behavior was particularly strong when people fell just short of reaching their goals.

— Goal Setting as a Motivator of Unethical Behavior, June 2004, The Academy of Management Journal, by Maurice E Schweitzer (University of Pennsylvania), Lisa D. Ordóñez (The University of Arizona), and Bambi Douma (University of Montana)

Here is a quote from a third study:

Our study finds that cheating in goal-based systems occurs due to financial rewards to cheating and to social comparison framing. Assigning a goal on its own without increased pay or social comparison framing did not lead to an increase in cheating relative to a do-your-best counterfactual. However, the use of either financial rewards or social comparison framing led to cheating.

— Why do goal-based incentives cause cheating? Unpacking the confounding effects of goals, social comparisons and pay, by Matthew Chao (Williams College), and Ian Larkin (UCLA Anderson School of Management)

We do need a lot more research about the effects of setting goals and targets. As far as I can tell, setting goals, by itself, does not cause significant problems. However, when we offer financial rewards, or compare people’s performance, then we invite cheating.

If a target is tied to a reward, then people will be prone to cheat to get the reward. The same way, if a target is used for social comparison, as in “Helen reached the target, and you didn’t”, then we also invite cheating.

I haven’t found any scientific research about targets tied to punishment, but since we are geared to react even more strongly to punishment than to rewards, it is highly likely that fear of punishment makes us cheat too.

Goals blinds us to creative solutions!

Goals do make us more focused. When we do something relatively simple, that only requires increased effort, that can indeed be effective. However, our working lives are often way more complex than that.

With goals, people narrow their focus. This intense focus can blind people to important issues that appear unrelated to their goal (as in the case of Ford employees who overlooked safety testing to rush the Pinto to market). The tendency to focus too narrowly on goals is compounded when managers chart the wrong course by setting the wrong goal…

— Goals Gone Wild: The Systematic Side Effects of Over-Prescribing Goal Setting, by Lisa D. Ordóñez Maurice E. Schweitzer Adam D. Galinsky Max H. Bazerman

Case studies and anecdotes are no substitute for scientific research, but if you got the research, they can be used to illustrate a point, so let’s do that:

Several years ago, I worked directly for the CTO of a company that had hired a sub-contractor to improve its IT infrastructure. The project was not going well, and failure would mean the sub-contractor would not get a new contract, and that would mean firing a lot of personnel. This was during a raging global financial crisis, and if there was one thing you did not want to do, it was lose your job.

The sub-contractor management set as a goal to minimize the damage to their company from missing the contract. Everything they did was focused on whom to fire, when to fire, what severance packages should look like, and other damage control measures.

I asked my CTO if I could get a shot at helping the sub-contractor make the project deadline, and he okayed it. There were benefits for both organizations if the sub-contractor made the deadline, and, of course, to the people who would keep their jobs.

The sub-contractor was less than enthusiastic about having me meddle in what everyone knew was a hopeless situation, but the CTO I worked for could be very persuasive when he wanted to be, so I got permission to make an attempt.

The development team was already working as hard as it could. There simply wasn’t enough hours in a day to add more overtime.

I began with an information gathering and brainstorming workshop using a method called Crawford Slip. Then, I showed the team how to set up a kanban board. They were less common back then than they are today. I would have preferred to stay with the team throughout the improvement project, but that proved impossible. Nobody wanted to spring for the cost of having me around full time.

Instead, I told the team to photograph the kanban board twice a week, and email me the photos. I spent several weeks doing nothing but recording the team velocity, and using the data from the brainstorming session to build a Current Reality Tree (CRT). The CRT is a problem solving tool from the Logical Thinking Process toolkit.

Then, I traveled back to the team, showed the project manager what I had found, and made a few suggestions. You can see the results in the graph below.

I met with the project manager week 40. Week 41, productivity went up. Week 42, it went up again. After that, it held steady around a new baseline. In two weeks, the average output increased by 316%.

Remember, they were already working as much overtime as they could, so the productivity did not come from working harder. It came from working smarter.

What does it mean to work smarter?

So, what does it mean to work smarter? You need to understand the process, find problem areas, and then fix them. The picture above shows the situation before we made changes. The developers worked on a bunch of jobs, all with high priority, and all broken down into several tasks. Since everything had high priority, they tried to start each job as soon as possible.

When they had finished one task, they immediately began working on another. The problem was that the new task they started working on, had nothing to do with the preceding task. They switched back and forth between working on different jobs. This lead to tasks belonging to different jobs getting interleaved, and this pushed lead times to become longer.

What we did was reorganizing the tasks, so that all tasks belonging to the highest priority job were finished first. Then all the tasks belonging to the second highest job were done, and so on.

One counter-intuitive thing is that we delayed starting work, in order to finish sooner.

In reality, the benefits are much larger than the diagram shows, because when you eliminate multitasking, you eliminate having to spend time on switching tasks. You do not need to change your tool setup as often, you do not need to backtrack to try to remember what you did the last time, and you do not need to figure out what the heck you were planning to do. That adds up to a lot of time saved.

Of course, eliminating multi-tasking is only one of the things you can do. For practical purposes, there are nearly always dozens, or hundreds, of improvements you can do. Once you have made an improvement, that often opens up the way for other improvements.

With OKRs, there would have been no improvement, because working harder was not the answer. Understanding the process, and eliminating a major problem, that did the trick.

It’s worth noting that a change like this is a sustainable change. No one is pushed to work harder. The developers can actually deliver more value while working less. That leaves more time for thinking, which can be used to find new things to improve.

How did things turn out? Well, the company managed to deliver, it got the contract it was after, and when I checked a year later, they had managed to keep all the personnel.

Had they used OKRs to just push to work harder, people, they would have had close to zero chance of finishing the work in time, and getting the contract.

Summing up

It’s time to sum this article up:

  • There is very little research on the efficacy of OKRs. The existing material consists mostly of testimonies and case studies
    • Testimonies and case studies are notoriously unreliable. We should not base our beliefs about OKRs, or anything else, on them.
  • While we cannot use scientific research to tell whether OKRs work, there is research on the building blocks of OKRs.
    • Agressive target setting: Setting agressive targets increases cheating substantially
    • Use of single data points: Using single data points can be, and very often is, incredibly misleading. Metrics systems should, by default, use long data series. 
  • OKRs ignore the effects of variation, and that is a great way to trick people into making bad decisions. We need some way of separating random variation within the system from special cause variation. We also need to detect trends in the data. Process Behavior Charts does all of this, but there are other methods that can be used.
  • OKRs do not tell us why we reach a target, or why we fail. It may be that we have improved something, or it may be that we got lucky, or unlucky.
  • OKRs do not tell us whether a change, if there is one, is sustainable, or if it will cause detrimental side effects.
  • OKRs present dependent variables as if they are independent. This is bound to cause confusion, and bad decisions. To be useful, a metrics system must link things together, so that we do not inadvertently cause big harm when we make a small improvement.
  • OKRs makes us focus on reaching targets, and this can make us blind to creative solutions. In other words, OKRs can prevent you from reaching the goals set by the OKRs.

There you have it: There is very little that speaks in favor of OKRs, and a lot of reasons to be suspicious.

In addition, we can easily create metrics systems that do not have the problems that OKRs have.

You can use Goal Maps, S&T trees, or strategy maps, to build a model of the organizations goals, and how they interact with each other. (I think Goal Maps and S&T trees work better than Strategy Maps for this, but that is a matter open to debate. There are also other ways to do the mapping.)

Add using Process Behavior Charts, or some other way to distinguish between systems noise and special cause changes, and you have a good start.

Of course, any metrics system has limitations, so you should be aware that in some situations, cause and effects based metrics systems cease to function. That happens when your organization dives into complexity. You don’t have cause and effect anymore, but you still have statistical dispositions, which means that while a Goal Map is less trustworthy, it is probably not totally useless. When you have true chaos, which happens only rarely, you have no causality, so you need to use other methods. (Check out Cynefin, if you are interested.)

Most of the time though, a good metrics system will help, but OKRs probably won’t.

Sunday, January 24, 2021

Tempo 2.0 - Section 3.4 Value Stream Maps

Mapping a Value Stream is an adventure, a bit like exploring an uncharted landscape.

If you are serious about improving the way your organization works… Let me rephrase that: If you are serious about improving any process, for an organization, for yourself, or for a friend, you will sooner or later need information about what the relevant value streams look like. You also need a simple, yet useful, way to map the information. Having that enables you to figure out where to focus your efforts.

A value stream map is easy to learn, yet very useful, tool for mapping value streams.

The purpose of a value stream map is to help you identify waste in a value stream. A value stream map tells you how much of the time that a goal unit[1] spends in the value stream that is value adding time and how much is non-value adding time.

Waste, schock, and incredulity

Schock and incredulity are common reactions the first time someone sees a value stream map. Does it really take that much time to produce a single goal unit? Do we really have that much wasted time?

The harsh reality is that in most value streams, goal units spend a lot of time waiting. Often, the non-value adding time is 99% of the total time, or more.

How can you have 99% waste in a process without anyone complaining about it?

I once worked as an interface designer for a very large enterprise software system. Designing and constructing an interface took about a week, and each connection between two subsystems had its own custom-designed interface. There were many, many subsystems, and two subsystems could have several different connections.

Interfaces were deployed once per year. The project had been ongoing for about seven years.

That meant an interface spent on average six months either in a requirements queue before it was designed, or after design and construction, in another queue waiting to be deployed. Thus, for a week of value-adding time (design and construction), we had about 25 weeks of non-value adding time.

A ratio of 1:25, means we have 4% value adding time, and thus, 96% non-value adding time.

But wait, there is more! I wasn’t the only interface designer. We produced about 60 interfaces per year. In seven years, that is about 420 interface designs.

How many interface designs did we need? One! We could have built the whole thing with one standardized interface, for example HTTPS[2], which is used for the vast majority of computer connections on the Internet. Instead of making a custom data format for each interface, we could have managed with about five standardized data formats.

Sticking to just the interfaces, we need to divide those 4% value-adding time by 420. That means we are down to 0.0095% value adding time, and 99.99% non-value adding time.

Note that we had two different types of waste:

One type was caused by having partially finished work laying around, instead of finishing the work and deploying it, so it could be used. Assuming that there was good economic reasons for building each connection, there was an economic cost attached to not having the connection.

The other type of waste was building things that should not have been built in the first place.

Of course, if you have 420 times as many designs as you ought to have, then you will need quite a lot of people to build the stuff, which means you will need more administrative personnel. The costs do not stop there, because you will need extra people maintaining all the extra stuff the project built, and there will be delays in maintenance too, which will cost even more

A company can easily create cost cascades that plague them for decades, and the funny thing is, identifying such costs, and stopping the bleeding, is usually a low priority.

Identifying and fixing problems like the ones in the story, requires a lot more than knowing how to make a value stream map, but value stream maps are a good place to start. Sooner or later, you will have to present your findings to other people, and value stream maps are very helpful there too.

Value Stream Map Example

Figure: Value Stream Map

The figure above shows a value stream map for a software development company. It shows how a single goal unit, a user story, travels from requirements gathering to the point where customers start paying for it.

The greatest delay, 3,200 hours, is between the initial Profit & Loss estimate, and the monthly meeting where work is prioritized. The queueing time is about two years. This means the greatest delay is in an administrative process that is executed before a development team even knows there is a requirement.

It is very common in software development that the greatest delays occur before, or after, the software development team is working on a user story. And yet, most companies would, in this situation, focus on improving the way developers work.

The mistake happens because no one examines what the value stream looks like before the improvement work begins.

Reducing waste with a Value Stream Map

Some years ago, I worked with a software development team that had not released anything to production for 3-4 months. We set about fixing a number of things that kept the team from delivering, and got to the point where the team released software once every two weeks, and could release more often than that, if they were asked to.

After the improvements, how long was our lead time? It was still 3-4 months!

I mapped out what happened to a user story after the team was done implementing it. It turned out it went on to a team that both trained users, and tested the software.

The problem was that when a new piece of functionality had been implemented, they tested it, wrote training materials, and held a course in a single batch, before approving a user story for release. We very politely asked them if they could test and approve user stories before writing the training material and holding the courses.

They said, “No, problem!”

From then on, we could release software as often as the business side wanted.

The reason it was so easy to make such a major improvement, was that the team’s Product Owner was an employee at the company, and knew the people in the training and testing department really well.

I contributed my theoretical knowledge about how processes and queues ought to work, he contributed both detailed knowledge about how the processes actually worked at his company, and his power to influence processes via his network of contacts, and friends, at the company.

Neither of us could have done it alone, and that illustrates an important point: When you try to fix something, you must work with people at the company, and you must do it all the way through the change. Handing off a change plan and expecting someone who hasn’t been a part of developing it, does not work! It does not work if you are a consultant like me, and it does not work if you are a manager, or leader.

It is often difficult to get an overview of a process. Most companies are still divided into function-oriented parts, where each part is interested in its own little piece of the value stream, and little or nothing else. Most of the time, like in the story above, the greatest delays occur where there are organizational borders.

Worth mentioning that “organizational borders” does not mean just departmental borders. For example, when you have multi-team projects or programs, borders between teams can be, and often are, a major source of delays.

If you are mapping a value stream across organizational borders, you will not always be able to find someone who is responsible for the whole value stream. If so, you will have to follow a goal unit as it travels through the organization, without having approval from a senior manager. Technically, this is easy to do, but you might want to have your CV prepared, in case you find out where someone has buried a skeleton or two.

If you find someone who is responsible for the whole process, it is likely that that person does not know in detail how the process works, or where the delays are.

If you are lucky, as I have been on several occasions, you will find a friend, someone who is as curious about what is going on as you are. You might get a bit of support, but do not expect that person to take personal risks for you, especially if you are a consultant. Even if it looks to you that they could just magically whisk you out of trouble when you have found something sensitive, that is probably not the case in reality.

By the way, do not mistake the organization chart for the reality. You are liable to find that goal units travel weird and mysterious ways before arriving at their destination. I have, on occasion, found that some work items do not actually have a destination. That is, they are still produced, but the intended recipient may have disappeared in a re-organization frenzy years before.

It is part of what makes value stream mapping such an adventure.


  The purpose of a value stream map is to help you identify waste in a value stream.

   In most value streams, goal units spend a lot of time waiting. Often, the non-value adding time is 99% of the total time, or more.

  There is more than one type of waste in processes. For example: Partially finished work waiting in queues, and building things you do not need.

  To find waste, you need to look at the whole value stream, across team and departmental borders.

  You need theoretical knowledge, a good understanding of actual processes, and a strong network of people willing to help you in the organization.

  Involve people from the start! If you hand over a plan to someone who has not been involved in creating it, execution will often fail.

  Get a sponsor within the organization if you can, but do not expect them to work miracles if you get in trouble investigating a value stream. At best, they can advice you where to be careful, and what to steer clear of.

  Value stream mapping is a lot of fun.

  There is a lot more to value stream mapping than this introduction. Be aware that there is a lot more to learn.

[1] A goal unit is the unit of measure you use to measure your goal. In agile software development, user stories, features, and epics are common goal units. For a car factory value stream, the unit might be a car. For an accounting system, the goal units are money.

[2] HTTPS has been around since 1994, so, even though the project was many years ago, no cutting edge technology was required.