## Saturday, February 18, 2006

### The Variance Trap, Part 3

If you read the 1st and 2nd parts of The Variance Trap, you know how statistical fluctuations can slow a project down to a velocity far below the velocity of each of the stages in the process. I demonstrated the effects of statistical fluctuations with a simple game with a die, a set of bowls that represented stages in a project, and a lot of colored glass beads. (This is a slight variation of a classic example by Eli Goldratt.)

I got tired of rolling dice by hand, so I started writing a Theory Of Constraints based process simulation engine. It is still very much a prototype, but today I wrote a computerized version of the beads and bowls game. I also managed to coax a simple animation from the simulator. It shows what happens with work buffers at different stages during the course of the project.

The simulation is extremely simplified. For one thing, there are no feedback loops from the test stages (Init = Unit Test; Intgr = Integration Test; Sys = System Test; Acc = Acceptance Test). For another, the variance is caused by a six sided die that changes the production capacity of each of the stages. There is no variance in the size of goal units (Use Cases, Story Points, or Feature Descriptions, depending on your preferences). Still, the simulation suffices to show the effects of fluctuations on a process.

The simulation run used to create the animation needed 8 (7.5) iterations to process the goal units. The average throughput per iteration was about 25. At the end of iteration 5, there were 96 goal units locked up in the process, out of a total of 188 goal units. This is a quite hefty inventory buildup. Still, it can be quite hard to even see the problem until it is too late. In many projects, work buffers are not even monitored.

At the end of the first iteration there is only a slight buildup of inventory. Seemingly nothing to worry about.

If this had been a real project, and if we had estimeted the length of it by calculating the production capacity at each stage, we would probably not suspect that anything is amiss. It is like the old story about how you can boil a live frog by putting it in cold water, and increase the temperature slowly. The frog won't notice that anything is amiss.

The image to the left is from the end of iteration 6. As you can see, there is a lot of inventory in the system. Even though Coding is done, the project is far from finished.

By this time, the frog is boiling. If we didn't know about the effects of statistical fluctuations, we might easily be led to believe that there is a problem with integration test here. There is certainly a backlog of work, but the integration test stage itself has the same capacity as the other stages.

The last goal unit leaves the system mid way through iteration 8. If this had been a real project, there would be quite a lot of frustration. What can we do? The first thing to do is to realize that there are two problems:

The first problem is how to make more reliable predictions about projects. The second problem is how to improve the development process itself.

Simulations like the one I've written about here can be a great help. In part 2, I did show a simple method of making estimates by extrapolating from the known throughput rate of the whole system. This method of estimating projects can be deduced from the simulation model. (It is not the only solution to the problem.)

The model also shows us that it is possible to improve the capacity of the system without having to increase the capacity of the capacity of the parts. How? By reducing the fluktuations. Expect a new simulation run, with the same average capacity, but less variability in the process steps, in the near future.

Until next time: good afternoon, good evening, and good night.

Hi !