Saturday, January 14, 2023

Are You Still Using the Wrong Control Levers in your Agile projects? Part 2: Business Value, it's Use and Abuse

In the first part of this article series, I wrote about how using Cost and Capacity to control an agile software project can trap an organization in a hire and fire cycle that increases project duration and cost.

This time, we will take a closer look at an agile control lever that works very well, except when it doesn’t: Business Value.

When used right, the Business Value lever can be the most powerful tool you have to steer an agile project or program towards success. When used wrong, and it often is, the Business Value lever can be completely disabled, leaving management to pull a lever that no longer works, and no longer has the ability to steer the project.

How to Deliver Business Value the Agile Way

Deliver working software frequently, from a
couple of weeks to a couple of months, with a
preference to the shorter timescale.
— Principles behind the Agile Manifesto, https://agilemanifesto.org/principles.html

Let’s start by looking at how the Business Value lever is supposed to work:

Most of the original agile methods, often called lightweight methods, were designed for small systems development. There was usually a single team, and that team built something that could be sliced up in many small deliveries that had value to a customer. For example, a website, or a payroll system, can be built and delivered with minimal functionality. After that, new functionality can be delivered in small increments at relatively short intervals.

The illustration shows how many small, fast deliveries begin to generate business value early, giving agile projects a head start on waterfall projects. In small systems development, small deliveries are deliveries to end users.

In order to deliver in small increments, you need a way of slicing up the application you are building. Thus, the User Story was born! A User Story is a short, non-technical description of something a user wants to do, written from the user’s perspective. The User Story was not the only way, nor the first, to describe vertical slices, little pieces of functionality that worked end-to-end, but it became the most popular.

Originally, User Stories were written on index cards, by an end user, in a freeform format. That worked very well. Some User Stories had a lot of value to end users, others had less. What you did to maximize business value was to sort the index cards from most valuable to least valuable, and start working from the top of the stack of cards.

User Stories provided an alternative to Critical Path for organizing the work. If you read the first article in this series, you may remember that the Critical Path is defined as the longest stretch of dependent activities in a project. Critical Path made it possible, when it worked, to minimize the total duration and cost of a project. User Stories, and vertical slicing, made it possible to deliver value much sooner.

With Critical Path and Waterfall methods, if you had a twelve month project, you would not see any business value from the project until after 12 months. With an agile method, you would get something of value after one or two months, and then, with each delivery, the business value would increase.

Getting business value sooner meant making money sooner. If you have ever played around with a Profit & Loss statement, you know that shipping working software just a little bit faster can have enormous impact on the total business value over the product life cycle.

Critical Path became obsolete, replaced by User Stories, and the idea of shipping a minimal product as quickly as possible, followed by incremental increases in functionality.

The Decline and Fall of the User Story

…user stories are based on real customers who will use the product once it’s released. As such, the Product Owner may choose to talk to potential customers in order to get a feel for what they want. There could be focus group discussions, interviews and any other kind of research needed in order to gain the intelligence needed to create a viable user story.
— Cohn, Jeff. Scrum Fundamentals: A Beginner’s Guide to Mastery of The Scrum Project Management Methodology

In the beginning User Stories worked very well, but then something happened: Scrum became the dominant agile method! Scrum did away with the idea that an end user should write the User Stories, and inserted an intermediary, the Product Owner.

Note the phrase “the Product Owner may choose to talk to potential customers” in the quote above. The Product Owner should talk to end users, but isn’t required to. In many large organizations, it’s difficult for a Product Owner to even find an end user.

The illustration shows the difference between Scrum and most other agile methods when gathering requirements. Even in the best case, Scrum inserts an extra degree of separation.

With most original agile methods, like eXtreme Programming (Kent Beck), Crystal (Alistair Cockburn), and Lean Software Development (Mary and Tom Poppendieck), it was explicit that the organization would have to change in order to support agile teams. One such change would be tearing down organizational barriers, so the development team could meet and talk to users directly.

The illustration shows the Crystal approach: An expert user, a business expert, and a designer/developer collaborate creating lightweight use cases. The designer/developer is part of the development team. This maintains a single degree of separation, while adding a broad business perspective, a deep understanding of what users do, and how they work, while also capturing more initial information and being scalable to larger, multi-team projects.

It is worth noting that some agile methods, notably Crystal, took a more scalable approach to requirements management. Crystal used lightweight use cases instead of user stories. Use cases also represent vertical slices of functionality. However, they capture a lot more information than user stories, while still being reasonably quick and easy to create. The use cases were created by a troika, an expert user, a business expert, and a designer/developer. Crystal was explicit about the high degree of expertise needed to create good requirements.

Scrum took a different path. Instead of changing the organization, Scrum inserted the Product Owner as an intermediary, isolating the development team from the organization around it. This was attractive to management in many organizations, because they did not have to change how they worked. Instead, the team could be treated more or less like a black box.

Isolating development teams from users came at a cost:

Loss of information!

In some organizations, the Product Owner works in close contact with the users. In those organizations, inserting an extra degree of separation might not matter much. In other organizations, where there is less contact between Product Owner and end users, the information loss may be substantial, and significantly reduce business value.

Note that User Stories are not complete requirements. Originally, they were intended as placeholders for conversations between developers and end users. With Scrum, developers will have those conversations with the Product Owner instead. For this to work well, the Product Owner must have both deep and broad knowledge of the end user’s area of expertise! However, the Official Scrum Guide does not require Product Owners to have any such expertise. Instead, Scrum focused on accountability:

The Product Owner is accountable for maximizing the value of the product resulting from the work of the Scrum Team. How this is done may vary widely across organizations, Scrum Teams, and individuals.
— The Scrum Guide, p. 5, https://scrumguides.org/download.html

The Scrum Guide goes into more detail about the areas of accountability, but is silent on the corresponding skills needed to do the job well. This focus on areas of accountability in lieu of skills is not limited to the Product Owner role. All other roles in Scrum are defined the same way.

Eliminating skill requirements contributed to making Scrum an easy sell. Unfortunately, the lack of skill requirements lead to an overall reduction in skill levels for all Scrum roles, compared to other agile methods.

Why would user stories being vertically sliced matter?
— Agile Way of Working responsible in a major SAFe program

Many, perhaps most, Product Owners did not get much training in capturing requirements. Nor did they learn why vertical slicing was important. At the same time, they had to do a difficult job. Instead of an end user writing a story, a Product Owner had to imagine what an end user would write, and then write that. To make it worse, the Product Owner often had to do it without expertise in the subject matter, and without expertise in the relevant business processes. The Product Owner job became very difficult indeed, and that could not help but reduce the quality of User Stories.

What happened was that Product Owners in many companies wrote task descriptions instead, because that was what they knew how to do. They still called them stories though.

The problem with this is that we do not have anything that structures the tasks into vertical slices of functionality anymore. Product Owners were supposed to organize the work order from the item with the highest business value to the item with the lowest, but tasks don’t have business value!

Business value is an emergent property of a vertical slice of functionality, or a set of vertical slices of functionality! That rarely stopped anyone from trying to do the impossible though. I have occasionally met Product Owners, and other people, who suspect that something fishy is going on with how we write requirements, but they are few and far between.

So, why does it matter whether a team uses User Stories or tasks?

It matters, because when the teams work with tasks, they do not usually complete all tasks in a vertical slice in one go. They may work on a front end task, and then move on to another front end task, or work on a business rule, and then move on to another business rule, without connecting anything end-to-end.

We have now disabled the Business Value control lever!

Whenever I see this phenomenon, teams still do deliveries, and still do demos. It’s just that they do internal deliveries, and the demos are Powerpoint presentations, or demos of some internal function of interest only to developers. End users cannot use the software in the internal delivery. End users are never invited to these demos.

Companies that do this can be even worse off than they were before agile, because they first got rid of the Critical Path based project management they had before, and then disabled the vertical slicing which was supposed to replace Critical Path management.

Yeah. And don't try to tell us there is no way to go but up because the truth is, there is always more down.
— Gunn, in the Angel TV-series episode Happy Anniversary

Confusing user stories with tasks causes big enough problems when doing small systems development. What happens when you scale up to large projects with many teams?

When we scale up, and teams become responsible for functional units in the system architecture, they also become dependent on each other. The teams have the same dependencies the system architecture has.

If front facing teams worked with real User Stories, then we could break those stories down into tasks that supporting teams worked on, and continue producing vertical slices of software.

We could also design the system architecture so that each team can handle a vertical slice of functionality.

In most cases, we would have to balance the two approaches.

What I have seen happening though, is that everyone keeps working on tasks. Architectural dependencies are not resolved. Team topology is ignored. Work consequently slows down, a lot!

I worked in one large project some years ago where deliveries were slowed down by a factor of 50! In other words, what should have been a two week user story took two years to implement instead.

Most other projects I have seen have had similar problems.

Mirror, Mirror on the Wall, Tell Me Which is the Most Fake User Story of Them All

As far as I know, no method has done more to confuse the issue of user stories versus tasks than the Scaled Agile Framework (SAFe).

SAFe has a requirements model with two kinds of stories: User Stories, and enabler stories.

User stories are, according to SAFe, vertical slices of functionality, but SAFe does not mention why that is important, or what happens to SAFe’s own requirements prioritization system if they are not. Then SAFe tops it off by introducing enabler stories, which are a kind of tasks.

Hierarchically, both user stories and enabler stories are the children of features. There are two kinds of features, business features, which are vertical slices of functionality, and enabler features which are a kind of tasks created by architects. Both kinds of features may contain either kind of story.

Moving up the hierarchy to business capabilities and enabler capabilities, both of them may contain both business and enabler features.

The top level is the epic. There are three main types, portfolio epic, solution epic, and program epic. Any type of epic may be of one of two subtypes, business epic and enabler epic, which gives you a total of six different kinds of epic.

Any kind of epic may contain any kind of capability, and any kind of feature. Any of these work items may be standalone. You may, for example, have enabler stories and enabler features who are standalone and not connected to anything that has any business value.

It’s a mess! I don’t think anyone has a grip on it. I may be wrong, but I haven’t met anyone who has figured it out yet.

What’s Bad for the Geese, May Be Good for the Gander

All of the confusion about agile requirements is bad for the industry as a whole, but it is actually an opportunity for those who manage to fix their user stories. When user stories describe vertical slices of functionality, the Business Value lever starts working, and that means the basic mechanism for making small, fast deliveries is working.

That, in turn, makes it possible to use other control levers more effectively.

I’ll describe some of those control levers in Part 3: The Iron Triangle vs. The Gang of Four.

Monday, January 09, 2023

Are You Still Using the Wrong Control Levers in your Agile projects? Part One: Cost and Capacity - The Levers of Death

Which levers should you use? When should you use them? Which levers should you avoid using? There is a subtle hint in the illustration.

Agile methods brought us new ways of developing software, and new ways of managing software projects, programs, and product development. Unfortunately, I have seen very few, if any, organizations that make good use of the powerful new management tools they have at their disposal. Instead, they continue to use the same tools they used before agile, often with predictably bad results.

In this series of articles I’ll provide a walk through of high level controls, their pros, cons, and how they relate to each other.

The Levers of Death: Capacity and Cost

Let’s start with the Levers of Death, Capacity and Cost. These levers are the ones I see used most often. They are not necessarily bad in and of themselves (well, firing people is bad), but they are easy to misuse, and often poorly understood.

In most organizations I have worked in, it is assumed Capacity and Cost controls are rather straightforward:

Capacity - Increase capacity, i.e. hire more people, when you want a project to speed up. This of course also increases cost.

Cost - Cut Cost, i.e. fire people, when the project gets to expensive. What is too expensive or not, is usually measured against a predetermined budget. Usually a yearly budget. Cutting cost is expected to reduce capacity, but the implications of that are often quietly ignored, because, what else can you do, right?

As we will see, the problems with the hiring and firing approach outweigh the benefits. Fortunately, there are better alternatives, controls that both provide more effective economic control, and are more humane. I’ll explore both bad and good control levers in this article series, but we will start with two baddies: Cost and Capacity.

Problem 1: Vital Information is Missing

The critical path (or paths) is the longest path (in time) from Start to Finish; it indicates the minimum time necessary to complete the entire project.

— The ABCs of the Critical Path Method, by F. K. Levy, G. L. Thompson, and J. D. Wiest, Harvard Business Journal, September 1963

When agile methods became popular, they were intended to replace older methods of management. That lead to chucking older practices overboard, because they were not needed anymore. Unfortunately, the capacity and cost levers were supposed to be, if not chucked overboard altogether, at least relegated to third rate status, but managers in many companies held on to them as primary project controls.

To make it worse, when organizations use them now, they do it without benefit of information they had 25 years ago. The reason is that some of the decision support needed to use Capacity and Cost effectively, actually was chucked overboard.

One such missing piece of information is the Critical Path. The critical path is the longest path, in time, from start to finish in a project. The critical path is extremely important in old style waterfall projects, because it determines the duration of the entire project.

If you know the critical path, then you also know that along the critical path you have a capacity bottleneck.

When you wanted to add capacity, the trick was to locate the capacity bottleneck, and add capacity there, and nowhere else. This will sound very familiar to anyone who uses the Theory of Constraints (TOC) management paradigm, and the TOC project management method, Critical Chain.

Conversely, if you wanted to reduce cost, you made very sure to reduce it in places that were not the bottleneck, and preferably not on the critical path.

“Dr. Livingstone, I presume?”

Before we look closer at why we need to know the critical path and the project bottleneck before we should even think about hiring and firing, we need to look at a problem the critical path idea was not designed to handle:

In projects with a lot of variation, like software, and product, development, the critical path, and the main bottleneck moves around, a lot!

The reason for the critical path and the project bottleneck moving around, is simple: Random variation!

When you build something new, which is what software development is all about, you do not have well defined lists of activities. Instead, you are doing exploratory work, with very limited ability to predict the future. To make it worse, the more detailed your predictions, the more wrong they will be.

Imagine, for a moment, that you are Henry Livingstone, on March 21, 1871. It’s the first day of your attempt to find the explorer David Livingstone, who had vanished in central Africa, several years later. You have prepared carefully for the rescue expedition, but it would be folly to make a predetermined plan of exactly where to go to find Stanley, and exactly how long it will take.

Livingstone’s expedition faced enormous dangers, or impediments, as Livingstone would have called them if he had been a Scrummaster: Crocodiles ate the pack animals, tse-tse flies gave them deadly diseases. Dozens of porters abandoned the expeditions, or died from dysentery, smallpox, malaria, and other diseases. Livingstone had been spotted near Lake Tanganyika, so Stanley had a general idea of where to go, but he had to pick up more along the way. He heard a rumor about a white man in the town Ujiji, and went there, not knowing whether he would find Livingstone, or not. By luck, he did!

Software development is like that. You can’t make detailed plans and schedules, but you can prepare.

Unfortunately, the whole critical-path-and-bottleneck idea requires that you can plan and schedule with a great deal of accuracy. If you can’t plan in detail, you can’t identify the critical path. If you can’t identify the critical path, it’s difficult to identify the bottleneck in the process. If the bottleneck, and the critical path, keeps moving around, a good decision about where to hire and fire today, will become a bad decision tomorrow.

There are things you can do to mitigate the problems with critical path, but I’ll leave that for another article series. In the more than 40 years I have worked in software development, I have yet to see a software project implement anything close to a useful solution.

Today, when companies have scrapped the whole idea of critical path management, fixing the problems with it, is of little relevance.

Instead, we will look at what happens when an organization uses the Capacity and Cost levers without knowing what the critical path and the project bottleneck is.

Problem 2: Adding Capacity Adds Work-In-Process

When you do not know where the critical path bottleneck is, and you add more people to a project, you are more likely to add people in other locations than the bottleneck, than to actually hit the bottleneck itself. That means most of the people you add won’t contribute to speeding up the project. Instead, they will add to Work-In-Process (WIP), queues of unfinished work in the process.

The larger the queues you have, the larger your lead times will be. Unfinished work in queues also add risk, because you won’t know whether the stuff will actually work with all the other stuff you build until you test it end-to-end. There are plenty of techniques for mitigating that risk, but you can’t eliminate it. Besides, most companies I have worked with are rather poor at this kind of risk mitigation.

Build up enough WIP, and your critical path will shift to the path where most of the added WIP is, which will increase project duration.

Thus, adding people won’t buy you the added capacity you think it will.

Communications Overhead

The Illustration shows how adding more nodes to a network, i.e. adding team members to a team, or adding teams to a project, causes quadratic growth in communications overhead.

On top of the WIP problem, adding more people will add communications overhead. The communications overhead can start out small for a small project, but it will grow quadratically while you add people linearly. This means when you add more people, productivity per person will go down.

I have 30 people in the project. The problem is I need only 5 people.

— Project manager, in a project I worked in around 2005

Worst case, you can actually reduce capacity when you add people! I have worked in 200 people projects that could have moved a lot faster if there had been only 20 people in the project.

The short of it is that adding more people will add cost, that we know for certain, but whether it will actually shorten project duration is a bit hit and miss. The larger your project, the higher the probability it will be a miss.

Problem 3: Cutting Cost Reduces Bottleneck Capacity

When management, often belatedly[*], discover that adding people added a lot of cost, but did not shorten project duration as much as they had hoped, or at all, the natural reaction is to cut costs, in order to make the budget targets.

Unfortunately, if you have 100 people in your project, add 100 more, and then cut 100, you won’t be back where you started. You will be worse off than before!

Why is that? Because we do not know where the bottleneck is, and because the bottleneck often jumps around, it is highly likely that cost cuts affect the bottleneck, either permanently, or intermittently. When that happens, the entire project is delayed.

The illustration shows how reducing capacity at a bottleneck can have much greater effect on duration and cost than expected.

Here is an example from a project I worked in:

Several years ago I was the Scrummaster for a development team where higher level management from time to time pulled one person from the team to work outside the team.

The team had 7 members, so both the team and management expected the remaining capacity to be 86% (6/7 = 0.857 ≈ 86%). However, this is true only if all team members are full stack developers, and if they all are equally productive.

The team had 5 developers and 2 testers. The testers were the bottleneck. Unfortunately, the person pulled from the team was one of the testers. That reduced the total team capacity to 50% (1/2 = 0.5 ≈ 50%).

If that team had been the bottleneck in the entire software development project removing the tester for 1 week would mean adding 1 week to the duration of the entire project. That also adds 1 week of cost for the entire project, way more money than management had expected.

Note that if you have, for example, a large SAFe program, with several Agile Release Trains (ART), and 7-10 teams in each ART, you could double the duration of the project by firing that single tester…unless you figure the problem out and hire a new tester. If you do that, and the critical path and the bottleneck then jumps to somewhere else, then the new hire will just contribute to creating more WIP, and you are back to Problem 2: Adding Capacity Adds Work-In-Process.

The point is that cutting just a few people from a project may have a disproportionate effect on project duration and cost, and you do not know where it is safe to cut cost! Sometimes it works, sometimes it makes the situation worse.

Problem 5: The Hire and Fire Death Spiral


Using the Capacity and Cost levers can easily drag a software development project into a kind of economic death spiral:

It starts with WIP going up due to statistical variation. More WIP means work will have to wait in queues, which means delays, which means cycle times go up for many teams. This means the project is delayed.

Because of the delays, deadlines are broken. Management tries to fix this by adding more people. This increases communications overhead. It also adds capacity off the critical path, which leads to more WIP accumulating. The net result is that the project does not speed up as much as expected, but it now costs more.

It is common that management keeps trying to speed the project up by adding even more people. This is less and less effective each time. This is partly because the communications overhead goes up quadratically when people are added. Another part is that the larger the project is, the higher the probability of missing the critical path altogether when adding new people.

Eventually management will notice that not only does the project not move forwards as expected, it also burns through money at an alarming rate. That is when the cost cuts come. Some of those cost cuts are likely to hit the critical path. When that happens, project duration goes up. Cost per day does go down, but the increased duration means there are many more days, so total cost goes up.

Cost cuts continue until management notices that we now have even more delays, so management starts hiring more people, and the Hire and Fire cycle starts over again.

The whole thing continues until the project either stumbles over the finishing line, or the organization gives up and pulls the plug on the whole mess.

Very, very rarely, management stops, decides the whole depressing cycle is daft, and decides to find a better way. When that happens, management often goes for the sweet promise of increased Business Value.

Next: Part 2: Business Value, its Use and Abuse.


[*] Agile methods have a built-in early warning system, monitoring queues. Unfortunately, organizations that rely on the Capacity and Cost levers usually do not use queue monitoring, at least not very well.