Simplified Goal Map showing three top level project/program goals, linked to a business goal.
Welcome to Part Four in the (as originally planned) three part series on high level controls for large software development projects.
First, a brief recapitulation of the first three parts of this series:
In the first part I wrote about how using Cost and Capacity to control an agile software project can trap an organization in a hire and fire cycle that increases project duration and cost.
In the second part, I wrote about how to optimize a project for delivering maximum business value. I also wrote about a common mistake companies do, which will disable the Business Value lever, so it no longer works as intended.
In the third part, I described a very common set of levers called The Iron Triangle. I also described a set of four levers used in the original version of Extreme Programming. I showed that both of these models have their pros and cons. Both can be useful, but neither is very close to the Holy Grail of high level project control levers.
In the interlude, I went off-track a bit, discussing agile project economics, and the frequent lack thereof.
In this, the fourth and final part, we will put together a simple system that provides a solid basis for managing a large project at a high level. It is not very comprehensive, and by no means complete, but it provides a good starting point. That, in itself, is better than what I have seen in most large projects.
Agile or Not?
Before starting an agile project, make sure that you really can use the many small deliveries (A+B) economic model. If you cannot, then agile may reduce value, because agile projects do not optimize for the Critical Path. The figure is from Part 2.
Before we start picking the best control levers for our agile project, it’s a good idea to figure out whether we really have an agile project or not. If we don’t, agile methods may not be the best way to manage the project. At the very least, we need to adapt the steering mechanism to the kind of project we do have.
At worst, if upper management believes the project is of one type, but it really is of another type, the smartest thing to do may be to give the project a pass (if you are lucky enough to be able to do that).
From a very high level perspective, what is it that makes a project agile? Let’s have a look at the most important factors:
- The project can be set up to have many, small deliveries of working software to the end user.
- This is a prerequisite for the agile economic model to work as intended. See the section How to Deliver Business Value the Agile Way in Part 2.
- Requirements are likely to change a lot during the course of the project. The project can be set up to handle these quickly changing requirements:
- Requirements can be written so they describe small vertical slices of functionality. See section The Decline and Fall of the User Story in Part 2.
- Developers have the skills and training necessary to do rapid change and redesign.
- The codebase must be continuously and aggressively refactored in order to keep the code loosely coupled. (If you want more information about why refactoring is essential, read my rant Interlude: The Cost of Agile.)
- Developers need high levels of skill in Refactoring and Refactoring Patterns.
- Developers need software design skills, such as Domain-Driven Design, or equivalent.
- To enable refactoring, testing must be automated.
- Developers need skills such as Test-Driven Design (TDD), Behavior-Driven Design (BDD), or equivalent.
- Development teams are decoupled, with as few dependencies between them as possible. This allows teams to work as fast as possible, without waiting for other teams. It also prevents unfinished work to build up in queues between teams. This implies the teams mainly (but usually not exclusively) are organized around value streams, rather than functional areas. Note that value streams, by definition, can exist only if the project really is set up to make small deliveries of value to the end user.
If the project fits the criteria above, there is a reasonable chance it can be run as an agile project. If it does not fit these basic criteria, the agile economic model will not work as intended. You may easily end up increasing costs by using agile, instead of reducing them.
On the other hand, you can, in most cases, still get a lot of value out of agile practices and techniques, but you need to combine them with techniques that do not exist in the agile domain, such as Critical Path or Critical Chain. (Please do not interpret that as “mix with waterfall project management”. Never, under any circumstances, do waterfall projects! I’ll save the details about why that is a monumentally bad idea for another article.)
Ok, now that we have assured ourselves we have at least the most basic requirements for running an agile project covered, let’s pick some high level control levers.
First Pick: Business Value
My first pick is delivered business value! Business value cannot be measured using a single metric, but you can use a combination of metrics. Here are a few suggestions. A good economist or accountant may be able to come up with better ones:- Revenue: This is revenue from the value stream created by the project. It’s the gross income, before subtracting expenses. An agile project delivers value to the end user long before the project is finished (a program doesn’t even have an end date). The end user pays for each delivery, and that creates a revenue stream. Combine that with rolling budgeting, and an agile project/program is at least partially self-funding.
If the revenue stream is zero, then you should start questioning the way your agile project or program works. It is either a dysfunctional agile project, or not an agile project at all.- Profit & Loss: As calculated in P&L statements. This is straight out of the Lean Software Development book by Mary and Tom Poppendieck. P&L statements can be used to determine the current state, and to make economic forecasts.
- Customer turnover: The customer turnover rate. This is the percentage of lost customers over a specified time period.
- Throughput: Measured in goal units (User Stories, Features, Epics, or other vertically sliced requirement) per time unit (iteration, Sprint, Program Increment, or similar). Note that this is really a proxy for business value, and can be misleading under some circumstances. Interpreted correctly, it can also be very valuable. If this drops to zero, or just never rises above zero, the entire project/program is busy creating partially built software.
Zero Throughput is not good! Sometimes it is unavoidable, but agile methodology is not designed to deal with those cases. If it happens, the thing to do is to determine whether there is a blockage in the value stream that can be removed, or whether the project methodology must be adapted to work under non-agile circumstances.
The above are just a few suggestions. There is a bunch of other metrics that may be more useful depending on your situation. The key thing is, I want to know how the many small deliveries the project makes affect the business.
I’ll give you a practical example:
I am currently working on a book. (Very, very slowly!) It is an art photography book, so I need to make text and illustrations work together on the page. Because of this, I am writing directly in a layout program, Affinity Publisher.
After I began writing the book, Serif, the company that makes Publisher, released a new version of the program. The new version had support for book files. That wasn’t mission critical for me, but it did make working on the book a lot easier. I could break each chapter into a separate file, and have a book file that kept the chapters together.
The update was free, so Serif did not get any money from me, but the probability of me staying with Affinity Publisher for future projects went up. Score one for reduced customer turnover!
As I write this, Serif has yet a new version of Publisher in beta test. This version has cross-reference support. Judging from the beta test reports, it is fairly good cross-reference support, not the sad dogs breath version you will find in MS Word, and all the Word clones out there. If you have used a professional document processor, like Adobe FrameMaker, you have a fairly good idea of what I mean.
If the next release of Affinity Publisher has cross-reference support as good as is indicated in the beta test forum, then my customer loyalty will certainly go up a notch again. This upgrade is also free, but my willingness to shell out money for the version 3.0 upgrade, when that day comes, definitely increases.
In the case of an application like Publisher, the manufacturing company can of course measure the number of downloads of the upgrades, and monitor reactions in the user forums on their website, to get an idea of the business value of the features they implement.
Changing the Business Value Lever
There are lots of things that can, and should, be done to push the Business Value lever in the desired direction. Here are a few basic ones.
Weighted Shortest Job First (WSJF)
We can change the Business Value lever if we can figure out the relative value of different requirements. We can do that using a formula called Weighted Shortest Job First (WSJF). I’ll just outline the basic idea here. Please do read up on the relevant literature before using it. (I recommend you read both the SAFe article about WSJF, and Donald Reinertsen’s book The Principles of Product Development Flow: Second Generation Lean Product Development before using WSJF.) The WSJF formula looks like this:WSJF = Cost of Delay / DurationThe Cost of Delay can be calculated a couple of ways. SAFe provides a formula, but very little information on how to use it in practice. The formula in SAFe was lifted from Dean Leffingwell’s book Agile Software Requirements, so you can find more specific information there. Mary and Tom Poppendieck also calculate Cost of Delay, but they use Profit & Loss statements in the book Lean Software Development. You may want to try both methods, and compare results. Reinertsen also shows how to calculate Cost of Delay using P&L statements in The Principles of Product Development Flow.Most of the time WSJF is calculated for Features, Capabilities, or other kinds of collections of User Stories. Individual User Stories are often to small to fiddle around with when estimating business value. Doing it for a group of User Stories, that will be developed together, is usually more practical.
Next, we need to estimate how long it will take to develop a Feature, or whatever unit of Business Value we have chosen.. The traditional way to do this is to torture developers until they come up with a credible sounding lie. Personally, I prefer using statistical methods, like Monte Carlo simulation. With Monte Carlo simulation, the project management can choose how certain they want forecasts to be. The developers are not directly involved, so they can’t be blamed if a forecast is wrong. (Instead, you can blame an Excel sheet, a piece of software, or statistics.)
Third, you add a time factor, that increases the longer the Feature has been waiting in a queue. This ensures that even low priority Features will eventually be implemented. Without the time factor, a low priority Feature can be pushed to the back of the queue over and over again, so that it is never implemented. Watch it a bit here, because a Feature can deprecate in value over time, to the point of having zero value by the time it is implemented.
One thing that should not be overlooked, is that you can often increase business value by removing features! A lot of software have a lot of crap functionality that no one uses. The only thing such functionality does, is making the user interface more complicated, and devaluing the software. As I have mentioned elsewhere, I once worked with a company that, by its own estimates, lost about eighteen million dollars per year in sales due to implementing a lot of junk functionality and having a correspondingly cluttered User Interface.So, basically, you want to dump crappy ideas, and order the rest according to Cost of Delay divided by development time + plus a time factor adjustment.
Use Alternative Scheduling Methods
The past few years the agile community has locked in on Weighted Shortest Job First (WSJF) as if it is the one and only planning method that is worth using. That is far from true.
WSJF requires that you can quantify both Cost of Delay and job duration with at least some degree of accuracy. That is not always the case. For example, a couple of years ago, I worked in a project where the business people flat out refused to provide estimates about business value. Instead, they tried to get the developers to do that. In such a situation, it is probably bet to treat the value of all Features as equal, and do the Shortest Job First (SJF).In a similar vein, if job duration cannot be determined at all, and yes, I’ve seen those projects too, it is best to use High Delay Cost First (HDCF).If you know nothing about anything, and research don’t yield any improved results, well, you can do Features in random order. At least they get done.
Note that everything I’m writing about scheduling here applies to vertical slices of functionality! I’ve seen projects trying to apply WSJF scheduling to tasks, and that simply does not work. For one thing, a task does not have business value. Business value is what you get when you implement several related tasks, and the implemented pieces of software interact with each other!By the way, I mentioned I’ve seen projects trying to apply WSJF scheduling to tasks. What do they do when they discover that does not work? Most Product Owners just fudge it and assign some arbitrary rank. One project found a really creative solution: They redefined the term “Business Value”. But I digress. Let’s move on…
Reduce the Cost of Change: Refactor Agressively
As I mentioned in my rant Interlude: The Cost of Agile a fundamental idea in agile software development is to keep the Cost of Change low by having all developers continuously refactoring the codebase, as part of the routine when writing code. This is usually done through practices like Test-Driven Design (TDD) or Behavior-Driven Design (BDD). To put it a bit harshly, developers that do not do this, should not be trusted writing production code.On a higher management level, you basically have two options:
- Hire only developers who know TDD, BDD, or similar techniques.
- Train the developers you do have, so they learn to keep the Cost of Change low.
I am much in favor of the train the developers option. Of course, to do that, you need to build a training organization within the company, and you need to hire and develop people with long term goals in mind.While you are at it, do the same for managers. You might be positively surprised.
Parallelize Processes
If you can structure the project as multiple independent parallel processes, then you can reduce project lead time, and Business Value will go up.
Sounds easy, but most organizations I have worked for fail miserably at this. They talk the talk, Value Streams, blah, blah, Agile Release Trains, blah, blah, but when they organize the teams, they do it around functions, just like they have always done, with the same miserable results they have always had (or even worse). The only thing they actually use, is the new terminology. The way they organize stays the same.
The problem is that agile methods are not designed for parallelizing the work in large projects, because most of them are designed for small systems development. When you do small systems development, it is okay to simplify planning down to putting all work items in a single queue per team. This does not work when we scale up projects. We need to parallelize the work, and we cannot do that if everything is in a single queue.
There is an agile planning method that does parallelize work. It’s Blitz Planning, from the Crystal family of methods by Alistair Cockburn. You might want to check it out.Parallelize for real, and you have a leg up on most other projects, agile or otherwise.
Second Pick: Queues
My second pick is queues, both in front of, and in, the development process. Why, because the queues predict lead times. By monitoring queue sizes, I can determine the overall state of the project:
- Queue size is constant over time: The teams are working at an overall stable rate of production. On average, each iteration, the teams deliver as much working functionality as they take onboard. This is ideal conditions for time estimates and statistical forecasts. (I still would not trust estimates though.)
- Queue size increases over time: Increasing queues is a strong warning signal! It means delays, and thus lead times, and costs are growing. The delays in time to market means reduced cash flow, a shorter life cycle, and heavily reduced total profitability. Teams are taking on more work each iteration than they can finish, so people will be stressed out. There is risk of people burning out, or leaving. Time estimates made by developers, and forecasts made by statistical methods, will underestimate how long it will take to implement requirements, often by a lot.
- Queue size reduced over time: This means lead time is shrinking. Projected total project cost is shrinking. There is less risk of developers being overloaded with work, though this can still occur locally, even if the overall picture looks good. Of course, if reducing queues is taken too far, the development teams will be starved for work, but this is a rare occurrence in practice. Another potential problem is a blockage in producing requirements. This is also fairly rare, but worth checking for. Forecasts are likely to overestimate the how long it takes to implement functionality.
What I’ll aim for, is a stable sweet spot, where lead times are fairly short, forecast reliability high, and people have comfortable working conditions.
Note that if you look at aggregated values, things can look good overall, despite a disaster occurring locally. Looking at overall queue sizes, or using any other metrics, does not replace going to gemba, and talking to people who do the actual work. The metrics just supplement that.
Changing Queue Size
Set project/program level Work-In-Process (WIP) limits, and stick to them! Then, when you can see that you have a stable system (check the queue size), try reducing the WIP limit a little, and see what happens. If the results are good, repeat the process. If the results are bad, back off a bit and restore the WIP limit to what it was previously.
That is basically it. There are finer points to it that would fill a book. You can find many of those finer points in Reinertsen’s The Principles of Product Development Flow: Second Generation Lean Product Development.In just about every large “agile” project or program I have seen, Scrum Masters do their best to hold WIP as low as possible by getting developers to pull tasks and stories through, and finish them, before taking on more work. Sprint backlogs are limited in size by the Sprint planning process.
This is all very good, but…
Nobody pays the slightest attention to half-finished material that builds up between collaborating teams! In most projects, all teams have their own separate Product Backlog, and that is usually where the half-finished stuff hides out. Nobody thinks about setting WIP limits on the Product Backlogs, or even whether it is a good idea for all teams to have separate Product Backlogs.If Scrum Teams become too large, they should consider reorganizing into multiple cohesive Scrum Teams, each focused on the same product. Therefore, they should share the same Product Goal, Product Backlog, and Product Owner.
— The 2020 Scrum Guide
The purpose of the above passage in the Scrum Guide, is to avoid creating hiding places for unfinished work. Of course, most companies I have seen ignore this, and go right ahead and create them anyway.
Scrum Masters, who should know better, go along without a peep. Product Owners don’t say anything either, but they at least have a valid reason: If you do what the Scrum Guide says, most Product Owners will be out of their current jobs. For example, most SAFe projects I have seen, have way more Product Owners than they should have, sometimes 15-20 times as many.
Note that I am not advocating firing 90%-95% of all Product Owners. I am advocating for giving them constructive work, instead of destructive work, and changing their job titles to something appropriate.There should, in most cases, be one Product Owner per product. Most organizations I have seen, solve this back asswards, and start calling every subsystem that has a dedicated team working on it, a “product”. It is easier to do that, than solving the actual problem with too many internal queues.
Setting strict high level WIP limits for the entire project or program helps solve the problem. Even if the internal structure of the project is bonkers, an overall WIP limit makes it more difficult to hide partially finished work in internal queues, whether they are called Product Backlogs or something else.
Of course, to make the project work well, you do need to set up a sane project structure, and it is much easier to do that early on, than late in the project. Also, the value of solving the problem is greater the earlier you do it.
While individual teams often use a method called kanban to set WIP limits, I would advice using Constrained Work in Process (CONWIP) at the project or program level.CONWIP is simple. You set a WIP limit. Once the WIP limit for the entire group has been reached, you do not add another unit of work to the system of teams until a unit of work has been finished.
For example, suppose you have eight teams working on a product. The teams are set up to work in parallel, so they can work on one Feature each simultaneously. Try setting the WIP limit to eight Features initially, and see how that works out. Experiment by adjusting up, or down, until the teams find the sweet spot, where the lead times are short, and Cost of Delay is low, and people have enough to do, so capacity cost isn’t too high.
Third Pick: Scope
My third pick is Scope. Agile projects and programs must have control of the scope! The reason is the inherent uncertainty built into even the best estimates and forecasts.
First of all, when a project starts, no one, not even the customers, know what the customers want. This will be discovered during the project. Quite often, things customers want at the start of a project, turn out to be completely unnecessary, or even undesirable, later on.
Second, forecasts are uncertain. Estimates are even more uncertain. To make delivery dates, projects need flexibility. The easiest way to achieve that, is to be flexible about scope.
Let’s revisit the Affinity Publisher cross-reference feature. Serif had planned to include cross-references in their 2.1 release. As it turned out, beta testers did not like the original design of the feature, and it was slaughtered in the online beta release forum. Serif decided to redesign the feature. To do that, they needed more time, so they reduced the scope of the 2.1 release, and will now include it in the 2.2 release instead.
What do the users think of this delay? They, and that includes me, are very positive to it. I’ll much rather have a great cross-reference feature a couple of months later, than a smells-like-a-dead-fish MS Word look-alike feature right now. The same goes for the other users who really need a great cross-reference feature.
One important caveat: When reducing scope, reduce the number of vertically sliced requirements! That way, you can track what you are doing, and you minimize the work of tracking down dependencies.I have seen projects where management tries to reduce scope by reducing the number of functional units. That is, they try to reduce the number of large tasks. When they do that, they need to track all dependencies of all functional units they want to remove, and that ends in madness. Partially, of course, because they did not pay attention to refactoring and creating a loosely coupled emergent design in the first place.
Putting it Together
Let’s put what we have got together with a Goal Map from TOC’s The Logical Thinking Process. I am using a variant that shows not only goals and sub-goals, well Critical Success Factors, if you want to use the correct terminology, it also maps behaviors and metrics to the goals.
As we have shown, there is a hierarchical relationship between our goals. In the project, our main concern is to maximize Business Value for the customer. We can do that by moving the Business Value lever directly, for example by using Weighted shortest Job First, or by decoupling and parallelizing processes.
However, changing the scope lever also impacts the Business Value. So does optimizing queue sizes. Therefore, the levers are connected.
I have added an overarching business goal for the company, maximizing the Return On Investment now and in the future. Few companies make things for purely altruistic reasons. They want to make a profit, so we should connect the project/program to that.
Note that I drew the connecting arrow using a dashed line. That is because we, just about everyone involved with agile methods, make the assumption that if we build what the customer really needs, then the customer will buy our product, and be happy with it.
The only snag is, humans are not that logical! We often make bad choices, even though we really ought to know better. That applies to software development, and it applies to buying and using software products too.
Still, we will let it stand for now. Fixing that little problem is way outside the scope of this article series. Just keep it in mind, if you build a fantastic piece of software that nobody wants to buy or use. The problem may be psychological, rather than technical. Sometimes, you may even have to dumb a product down in order to get it accepted. Been there, done that!
Metrics Do Not Replace Talking to People!
I have mentioned it in the article, but it deserves to be mentioned again: Top level control levers, and metrics, are a complement to talking to people in the project. The levers and metrics are topics for conversation. They can be used as conversation starters and action triggers. They in no way replace talking to people!
The project management method you use, whether it be Scrum, SAFe, or something else, will define a bunch of meetings with various people. Pay attention to which meetings are really useful to the people participating in them. Keep the useful ones. Cut down on the rest, and if everyone is still happy, ditch them, and check if people are still happy.
Originally, agile methods cut meetings to a barely sufficient minimum, so that developers could get on with the work. Today, developers often see agile as nothing but a bunch of useless and boring meetings that block them from getting stuff done. Unfortunately, they can present a pretty good case for that. Watch out for meeting-creep. You may wish to go back to the original agile methods, like eXtreme Programming and Crystal, and see what kind of meetings were deemed worth having.
In meetings, people will tell you what they think you want to hear. Bad news has a tendency to disappear, or at least get delayed and softened. One fix for that is to encourage people to give you bad news. Commend and praise them for it!
Another, is to Go to Gemba. (Also known as management-by-walking-around.) Go to the places where the work is actually done, and talk to the people actually doing the work. Bring a pen and a notebook. Note down any problem you may be able to help fix. This should be nearly all of them, since you are a high level manager. After awhile, when people see that you are serious about helping out, they will tell you things that never come up in the meetings.Coming Up: The Rejects - Why they Didn’t Make It, and What I Did With Them Instead
This part is about twice as long as the other parts in the article series, and as it turns out, there are still some things to say about control levers for large projects.
When I picked my favorite levers, I discarded a bunch of others:
- Capacity
- Cost
- Duration
- Quality
- Cycle Time
That I did not choose these as control levers does not mean they are useless. There are other things that can be done with them, and that is what I will write about in part five.
Also, I did put a few constraints on basic properties of agile projects and programs. A lot of projects that purport to be agile simply are not. Unfortunately, knowing that does not help, so I am considering writing a sixth part about what to do if you find yourself managing a pretend-to-be-agile project. Please let me know if that is of interest.
I do realize I am considering writing part five and six in a three part article series. Well, that is why agile projects need flexible scope. This article, and the previous one, Interlude: The Cost of Agile, constitute a good example:I had intended to write one article, saw that it was getting really big, and would take a long time to finish, so I split it into two deliveries, Interlude: The Cost of Agile, and this article. Thus, I could deliver value to my readers, my end users, faster, by reducing the scope of the first delivery.Assuming that at least one reader found something practically useful in that first delivery, they could act sooner, and get the benefits of said value for a longer time. Thus, the total value of the benefits of the advice in the article increased.
That is exactly how agile methods work! It’s a litmus test. If a project does not work like that, its not agile, no matter how well you follow the Scrum ceremonies, and regardless of whether you have RTEs and are organized into ARTs, or whatever.
Then, of course, while writing about the control levers I find useful for large projects, I stumbled on the idea that there is also value about writing about the discarded controls. They are far from useless, even if they are not the primary control levers. Thus, I increased the scope, in order to increase the value, and I did it while writing Part Four, not while planning the original three parts.That is also how agile methods work!
Both reducing and increasing scope can increase business value. It is all a matter of context.
Got to stop writing now, so I can begin on the next part.
Comments