Wednesday, February 20, 2008

Agile Productivity Metrics Again

Ken Judy posted a thoughtful reply to my post commenting his post about productivity metrics. Judy writes:
Just to be clear, my objection is not that agile should not be justified by hard numbers but that I haven't seen a metric for productivity gain specifically that both stood systematic scrutiny and was economically feasible for the average business to collect.
If you have an andon (board with sticky notes representing units of work) set up, it is easy for the ScrumMaster (or project manager, if you do not use Scrum), to enter information about when each sticky note is moved into a spreadsheet. This takes the ScrumMaster a few minutes every day. (Or every other day. I would not recommend measuring less frequently, because if you do, you will miss information about inventory build up, and slow down responses to problems.)

From the raw data the spreadsheet can produce:
  • A burn-down graph. The usual way of visualizing progress in Scrum projects
  • A cumulative flow-chart, showing build up of inventory in each process stage. This is a very valuable tool for finding process bottlenecks
  • A Throughput chart, where Throughput is defined in terms of goal units per time unit. A goal unit may be a Story Point or Function Point, or even Story or Use Case. (Story Points and Function Points are a little bit more uniform in size, so they work better.) To be useful, the Throughput chart must have an upper and a lower statistical control limit. Without that, the chart is just garbage.
If you have a truly agile development team, where every member is a generalist, and everyone works on everything, the Scrum burn-down graph tells you everything you need to know.

The more specialization, and the more process stages you have, the more important the cumulative-flow chart becomes. I won't go into details here, but see David Anderson's book and
Reinertsen's Managing the Design Factory. This chart is useful to pinpoint the Capacity Constrained Resource in the project, which is a prerequisite for effective improvement efforts. It is also useful when judging the impact on events on the project, because prject velocity is determined by CCR velocity. (Bear in mind the CCR can and does shift.)

Both of the charts discussed above measure Design-In-Process (Inventory in TOC terms), but velocity can be derived from them. There is a catch though, as Judy points out, there are unknown measurement errors. In addition, velocity varies, a lot, for a multitude of reasons.

The throughput chart shows velocity. If that was all there is to it, it would be a less than useful tool. Fortunately, there is more: the statistical control limits. They show (if you are using 3 sigma limits) the upper and lower bounds of the velocity with 95% probability.

You can do a lot with this information:
  • If there are measurement points outside the upper and lower control limits, the development process is out of statistical control. That means you have a problem the company management, not the project team, is responsible for fixing.
  • When you take actions to reduce uncertainty, the distance between the upper and lower control limit will change. Thus, you can evaluate how effective your risk management is. A narrow band means the process is more predictable than if the band is wider. This is important when, for example, predicting the project end date.
  • You can prove productivity improvements. If you have a stable process, and then make a process change (shorter iterations for example), and productivity rises above the upper control limit, then you have a real productivity improvement. (Or someone is gaming the system. However, if someone is, it will most likely show up in other statistics.)
  • You can evaluate the effect of various measures, because you know how big a change must be to be statistically significant.
I have worked with project teams that have Throughput variations of more than +-50%. The Throughput chart is very useful. It would be useful even if the variation was considerably greater, because the amount of variation is in itself a measure of the size of the problem. (I won't delve into the intricacies of actually finding root cause of the problem here. Let's just say the TOC Thinking Process comes in handy.)

So, the data is feasible to collect. There is no additional overhead compared to what ScrumMasters already do, because the new information is derived from the same data they use to create burn-down charts. It is just processed a little bit differently.

I would also say the information is extremely useful. However, I agree with Judy that productivity information on its own does not tell the whole story. For example, a feature may have a negative business value, so producing it faster means the customer will lose money faster. Also, a set of features that individually are considered valuable, may have a negative business value when considered as a set. This is usually known as "featuritis".

Using a productivity measurement without understanding it is a recipe for disaster. I agree with Judy there. The position I am advocating is that using it with understanding can bring great benefit.

Judy also writes:
The problem with justifying an agile adoption based on revenue gains is there are so many other considerations that attempts to credit any single factor become dubious.
This is both true and false. It is true because that is the way it is in most companies. Nobody understands the system, so nobody can really tell which factors have an effect or not. Attributing success, or failure, to agile under such circumstances is bad politics, not good management.

On the other hand the statement is false because it is quite possible to figure out which factor, or factors, that limit the performance of a company. If the constraint is the software development process, then implementing agile will help. (Assuming it is done correctly, of course.) If the software development process is not the constraint, implementing agile will not help. Note that symptoms often arise far from the constraint itself. For example, a problem in development may show up in marketing, or vice versa. (Figuring out such causal connections is an important part of what I do for a living.)

The reason it is possible to figure out what the constraint is, is that companies are tightly coupled systems. In a tightly coupled system, the constraint can determined. Much of the time it is even quite easy to do so. The real trouble begins after that, when you try to fix the problem.

The method I use to Find-and-Fix is primarily the Theory Of Constraints (TOC). There are other methods around.

Judy finishes with:
If someone can propose a relevant metric that is economical for a small to medium size business to collect, that can be measured over time in small enough units to show increased performance due to specific process changes, and doesn't create more problems than it solves, I will be happy to consider it.
I can do that. So can any decent TOC practitioner or systems thinker. There are a few catches though:
  • Measurements must be tailored to the system goal. Very few organizations are exactly alike in terms of goals, intermediate objectives, root problems, and constraints. Therefore, measurements must be tailored to fit each specific organization.
  • Organizations change over time. When objectives or internal constraints change, measurement systems must also change.
  • The environment changes over time. This means external constraints may appear, or disappear. For this reason too, measurement systems must change over time.
The lag between the change that makes a change in measurement systems necessary, and the change in the measurement system, can be very great. Again, I won't go into details, but most companies use accounting practices that are about a hundred years out of date. (This is the reason for much of the friction between accountants and everyone else.)

There is no "best practice" set of measurements for software development. What you measure must be determined by your goals, and by the system under measurement. Once this is understood, measurements can be tailored to be what they are supposed to be: a tool set for problem solving.

Measuring is like anything else, it is very difficult if you haven't learned how to do it. A prerequisite for measuring complex systems, like software development teams and organizations, is understanding the system. To do that, you need to know a bit about systems thinking. You do not have to be the world's greatest expert, but you need to be well versed in the basics.

The first thing to do if you want to evaluate an effort to measure, is to ask for the systems map the measurements are derived from. The presence of such a map does not prove the presence of a good measurement system. However, the absence virtually guarantees the measurement system is dysfunctional in some way.

In 1992 Norton and Kaplan introduced the balanced scorecard system for creating measurements. It didn't work very well, precisely because there was no way to connect measurements to strategic objectives. In 2001, they rectified the problem by introducing strategy maps. I do not use this method myself, so I haven't evaluated. Seems to be on the right track though. Unfortunately, most people who design balanced scorecards use the earlier, flawed method. Go figure...

I use Intermediate Objective Maps, which are part of The Logical Thinking Process, a TOC application for doing systems synthesis and analysis. An alternative is using Strategy&Tactics Trees. However, S&T is currently poorly documented, and there is only a handful of people that can do them well.

It is also possible to use a combination of Current Reality Trees and Future Reality Trees to figure out what to measure. That is what I did before learning to use IO Maps.

So, IO Maps, S&T Trees, CRT+FRT, and the revised version of balanced scorecards, can be used to figure out what to measure.

As far as I know, none of these tools are part of any agile method. Not even FDD uses them, despite strong ties to TOC. Consequently, few agile practitioners have come into contact with the tools and the knowledge base for creating good measurements.

Consequently, the difficulty of making useful measurements is perceived to be greater than it really is. Tailoring a measurement system to fit an organization is a skill that can be learned. It is just not part of the agile repertoire, yet. I hope it will be.

Oh, in closing, a good measurement system must be able to measure itself. That is, if a measure does not work as intended, it must show up as an inconsistency between different measures. Otherwise, mistakes in the measurement system are very hard to catch. Fortunately, this can usually be accomplished fairly easily.


Kshitij Agrawal said...

can you clarify why company management has to fix problem if any control point is outside control limit. Shouldn't it be project team fixing the problem.

Henrik Mårtensson said...

Good point. It really should be the project team fixing the problem.

However, the team rarely has the authority to fix it, for two reasons:

* Agile teams are supposed to be self-organizing, but they rarely are. Management usually wants to be in control and hoards power. (Self-organization, by definition, gives teams power over just about anything that concerns the team, including who leads it, which working methods are used, how to spend money, etc. Management rarely approves of that.)

* When you do a root cause analysis, most of the time the root cause of the problem is outside the team. That means you often have to fix something in a a different part of the organization in order to eliminate the problem.

The above applies to functional organizations. Networks have less of these problems. However, most companies are functional hierarchies.