Monday, March 06, 2006

The Variance Trap, Part 5

This is the fifth part in an ongoing series of articles. You might wish to read part 1, part 2, part 3, and part 4 before reading this one.

Until now every simulation I have described has used an extremely simplified model of a development process. It is time to try a more accurate model, and see where that gets us.

Here is the model I am going to use:


(Click on the image to see a full scale version.)

The numbers in the boxes at each stage in the process indicate how many goal units that can be produced in a single day. To this number, I add a random variation of +/- 2 units to represent fluktuations in production capacity. (Judging from experience, this is a lot less than in many projects. I want to err on the side of caution.

50% of all goal units fail at Unit Test. The effort needed to fix a defect at unit test is 10% of the original effort. Integration tests have a 20% failure rate, and the effort to fix is 10% of original effort. System tests have a 10% failure rate, but since fixing a failed system test requires a bit of redesign, the effort to fix is 20%. Acceptance tests also have a 10% failure rate, but a failure means the customer did not like the functionality at all, so fixing it is a complete rework, at 100% of original effort.

The thing I am going to focus on this time around is batch sizes. Most software development methodologies focus on iterations as a way of steering a project. As we shall see, the size of iterations also has a great effect on development speed.

I am going to show two simulated projects that differ only in the size of their batches. The first project uses large iterations: Integration tests are made once every 30 days, system tests comprise 60 days of new functionality, and the release cycle is 60 days.

The second project runs integration tests every 5 days, system and acceptance tests every 10 days, and has a 30 day delivery cycle.



As you can see, the project with the smaller iterations is more than twice as fast as the project with the large iterations. How can that be? The projects process the same number of goal units, and the processing stages in the two projects have the same capacity. Is it the statistical fluctuations that cause the difference? No, not this time. I get similar results every time I run the simulation. The project with the smaller iterations is faster every time, so the iteration size must affect the velocity of the project.

To solve the mystery, let's look at the problem from the point of view of a goal unit. A goal unit will spend time being processed, i.e. being transformed from an idea into a set of requirements, from requirements to large scale design, from design to code, etc. It will also spend time in queues, waiting to be processed. After processing, it may spend time waiting, for example waiting for other units in the same batch to be processed, before it can move on to the next stage. In many processes, there may also be a significant move time, when a goal unit is moved from one stage to another. In software development processes, the move time is usually rather short.

A simple diagram (not to scale) showing how a goal unit spends time in the system looks like this:

To create a batch consisting of 10 goal units, a processing stage has to perform its task 10 times. This means the first goal unit in the batch will have to wait for the other 9 tasks to be processed, the second task will have to wait for 8 tasks to be processed, and so on. The 10th task won't have any waiting time. (On the other hand, it may have spent a lot of time in a queue before being processed, but we will leave that out for now.)

It should be clear that on average, a goal unit spends a lot more time waiting to be processed, than actually being processed.

Look at what happens if we halve the batch size:

The average wait time is reduced considerably. This is why the development process with the smaller batch sizes is so much faster.

In real projects, there are many factors that may obscure the benefits of using smaller batches. Here are a few:
  • Management does not know about the effects, and therefore never sees them.
  • Requirements are changed in mid-iteration, which disrupts the iterations and leads to build-up of half finished work. Sometimes this effect is so severe that a project never manages to deliver anything.
  • A process stage is blocked for some reason. If the project manager keeps track of hours worked instead of goal units processed, it is easy to miss this problem. It's like trying to measure the speed of a car by looking at the tachometer. The engine is running, but the wheels don't necessarily turn.
Nevertheless, minimizing batch sizes is a sound strategy. Provided that a project is reasonably well run, the effects of having shorter iterations and release cycles can be dramatic.

From a strategic point of view, is there anything more we can do? Yes, there is. One thing should be pretty obvious: the batches in this example are still pretty large. A project that runs integration tests once per week still uses pretty large batches. With a build machine, that time can be reduced to once per day, or perhaps once per hour. With automated tests, the unit and system test cycles can be as short, or shorter. In many projects, even acceptance tests can be automated, except for tests that have to do with the look and feel of the user interface.

In a project that uses automated tests, build machines, and scripts that automate the deployment process, it is entirely reasonable to cut iteration and release cycle length to one or two weeks. (Well, not always. If users have to install their software themselves, they may not appreciate having to do so every Monday morning.)

Another thing to note is that both processes described in this installment are unregulated. That is, all process stages run at full speed all the time. This is how most companies work, and it is how most project managers run their project. However, looking at the animation above, we can see that this leads to a build-up of quite a lot of unfinished work in the development process.

It should be possible to feed the process slower, without reducing the speed of the system. This would reduce the build-up of unfinished work. This would be an advantage, because goal units that have not yet entered the process can be changed or removed without incurring extra cost to the development process.

Finally, it might be worth studying the effects of spending more effort on testing. If we automate testing, this will reduce the throughput at the Coding stage, but it will also reduce the rework that has to be done. Will more testing increase or reduce the total throughput of the system?

Plenty of fodder for thought, and material for at least a couple more installments in this series of articles.

2 comments:

Tobias Fors said...

Henrik: your variance trap series is absolutely excellent. Keep writing!

By the way: do you know of any open source software for doing this kind of system dynamics modeling? I've tried demo versions of iSee systems' Stella, but since I'm just learning about these things, I'm not about to buy something.

Henrik said...

Thanks!

I don't know of anything good. As far as I know, there is no open source systems simulation software around. (My own little prototype hardly counts.)

Something else that is missing is logic tree software. I recently started using the Theory Of Constraints Thinking Tools, and have found them to be very useful. I either use pen and paper, or Open Office Draw. Haven't managed to find anything but commercial software that supports the TOC Thinking Tools.

Considering how useful such tools can be, it is surprising that there aren't more of them around.