Sunday, November 19, 2006

The Big DIP

Lately, I have had reason to study the effects that Design-In-Process, DIP, has on the profitability of software projects. It turns out DIP is a project killer, and potentially a corporate business killer too.

What is Design-In-Process? It is partially finished work. Most software development projects have too much of it. The manufacturing industry learned decades ago that too much inventory was a bad thing. In the software development business, the inventory is Design-In-Process. We still haven't learned the lesson.

Why is DIP bad? Let's have a look at the classic Return On Investment (ROI) equation:

ROI = (T - OE) / I

Where:
  • ROI is the Return On Investment, a measure of how profitable a business system (company, project, etc.) is
  • T is Throughput, the amount of money made
  • OE is Operating Expenses, the money the business system spends producing throughput. Wages, rent, electricity, are all OE.
  • I is investment, the money tied up in the system. This includes computers, and furniture, but most of the value tied up in a software development project is usually DIP.
The problem with DIP, is that it is risky. If a requirement changes before it has entered the development process, for example if it is in a Scrum backlog, the change costs very little money.

If a requirement changes after it has passed through the process, well, since the customer has already accepted the implementation of the original requirement, the requirement change is ground for asking for more money. No problem. (Except possibly for the customer, who wasted money by asking for the wrong thing in the first place.)

If the requirement change comes when the requirement is in process, it can get expensive. The partially implemented code must be removed, which takes time. Then the new requirement must be implemented.

Developers know that doing this is a real drag. It is boring and time consuming. Project managers don't like it, but since it is not their own time that is wasted, most don't think very hard about the effects. Most project managers don't even know how much DIP there is in their projects, have absolutely no idea what it is worth, and consequently do not know what changes to the DIP costs.

This, I am afraid, is probably true of most agile developers and managers too.

How much money are we talking about? Most project managers encourage developers to work at full capacity all the time. When there is a bottle neck somewhere else in the process, this means DIP will build up. My experience is that there almost always is such a bottleneck. Common culprits are testing, or the delivery cycles may be too long (in which case Sales has goofed).

It is not uncommon for the developers to exceed the capacity at the bottleneck by a factor two or three. That means that if the Operating Expense of a project is 100,000 EUR per month, only about 33,000 EUR gets through the process, and the remaining 66,000 EUR gets stuck in process. It is not uncommon for DIP to continue to build, until all requirements are DIP. All you need for that to be virtually guaranteed is a fixed price contract, payable on delivery...

It is not uncommon for requirements changes to hit the DIP in a big way. Over the past few years, since I started paying attention, I have seen requirements changes that reduce the value of the DIP by 50% or more. One project I worked in had more than thirty man-months worth of DIP, and then there was a change that wiped out all of it. (There was a switch in development languages, so all of the code was thrown away. Please don't ask why...)

If you want to get control of the DIP, there are a few things you must do:

  • Measure it! What you do not measure, you cannot control. The first step is to start measuring DIP. To do this, you need to measure Throughput, and you must also measure the build-up of DIP in the development process.
  • Make it visible! Neither developers, nor managers, are used to thinking about DIP. The idea that it costs money is strange to most of them. You need to make the DIP visible in ways that are impossible for people to ignore. (Some people will continue to ignore facts, even when they are staring them in the face. Just hope that none of them are your bosses.)
  • Reduce it! You can do this with a variety of techniques, either based on Kanban (XP), or Drum-Buffer-Rope (Feature-Driven Development). Even though I really like XP, I have found Drum-Buffer-Rope to be the more flexible method.


Visible Measurements

An easy, and highly visible, way to measure DIP, is to make a Project Control Board. Use a whiteboard, or a large mylar sheet on a wall. Make a table with one column for each hand-off in your development process. The stages will be different depending on circumstances. For example, where I work, the software we produce will be used in hardware made by another company, so there is a stage where the software is tested in-house with a hardware simulator, and another step where the software is shipped to the other company for testing with their hardware.

Typical stages might be: Requirements analysis, Behavior Specification (if you use TDD or BDD), Coding, Integration Test (includes automated build, and running automated tests), Acceptance Test.

At the start of each iteration, write sticky notes for each story (use case, feature description, or whatever you use). Prioritize the stories, and put them on your whiteboard, to the left of the table you made. It is also important that you estimate the size of each story. I use Story Points, a lá XP. Function Points is an alternative.

Whenever a developer begins to work on a new story, the developer moves the story from the stack of sticky notes to the first column in the table. The developer is then responsible for moving the story to a new column each time the story goes into a new phase of development.

A tip: use different colors to distinguish functional stories, defects, and any unplanned stuff management may throw at you.

Yet another tip: document the board by photographing it each day.

The sticky notes in the table are your DIP. Track it closely. If you have a process problem, you will soon see how sticky notes pile up at one of your process stages.

To calculate the value of your DIP, sum up the Story Points you actually produced in one iteration. Let's say you produced 30 SP. Then sum up the DIP, let's say that is 45 SP. You also need to know your Operating Expenses for the iteration. Let's say those where 30,000 EUR.

You have spent 30,000 EUR to produce 30 SP. That means you need 1,000 EUR to produce one SP. The DIP is 45 SP, so it is worth 45,000 EUR. If a requirements change affects 20 SP in the DIP, your project should write off 20,000 EUR as a loss.

Believe me, once you start measuring, you will see this happen over and over again.

Now, how do you cut your losses? By reducing the DIP as much as possible. That is what I'll blog about next time, probably.

Thursday, October 26, 2006

Traffic Simulator

While looking for material about queue theory, I came across a traffic simulator that is worth a look. With it, you can simulate how traffic moves, and doesn't move, in different kinds of situations. It is worth a look.

Scrumming

I have begun working in a new project recently. It is the first Scrum project at Wireless Car, and they hired Tobias Fors from Citerus to get us started.

Tobias presented Scrum in a manner that was both entertaining and informative. The kick-off took two days. The first day was a Scrum presentation, the second day we did Sprint planning.

It is rare to meet someone that can talk about the same subject matter for two days straight without being boring even once. I usually make people's eyes glaze over within a few minutes, so I was very impressed with Tobias. Also, for two days I had someone new to talk about agile and TOC and queue theory with. Great!

Saturday, October 14, 2006

The Gothenburg Ruby User Group, 2nd Meeting


I missed the first Gothenburg Ruby User Group (GRUG) meeting, but not the second, last Thursaday evening. 15 geeks (only a few of whom are shown in the picture), and Ruby. Doesn't get much better.

Niclas Nilsson and Karl-Johan Kihlbom talked about the recent RailsConf conference in London.


The main event, arranged by Emily Bache, was a code kata. You can read more about it in the GRUG Google Group at http://groups.google.com/group/grug.

The next meeting will be in a month or so. I'm looking forward to it already.

Sunday, September 17, 2006

Flow vs. Function

I have written a presentation comparing flow organizations and functional organizations. You can check it out here if you are interested in this sort of thing.

Wednesday, September 06, 2006

Systems Thinking: Five-Why

In the Systems Thinking series of posts (also including Systems Archetype: Shifting the Burden and Systems Archetype: Shifting the Burden to the Intervenor), I try to show the basics of how complex business systems interact. With the systems archetypes, it is possible to find standard solutions to common problems, but there is still a piece missing.

In a specific case, how do you find the root cause to your problem? Have you ever worked in a project, or workplace, where it feels as if you are spending all your time fighting little fires? Usually, all those little fires stem from a few root causes. Unless you want to spend your time fire-fighting, you had better find the root cause of your problems, and deal with that.

There are many methods of finding root causes. One of my favorites, because it is so simple, is Toyota's Five-Why method. The idea is that when you are faced with a problem, you ask why that problem occurred. When you find the thing that caused your problem, you ask why that thing is the way it is. Do this five times, and you will have found a root cause. Fix that problem, and the other problems will never occur again.

There is of course a snag (there always is), when you trace a problem, any problem, back to its source, you will very often find that the root cause is a management policy (or lack of it). This means that solving the problem will involve some effort on the part of the management, sometimes a lot of effort. This is a good opportunity for you to find out whether your managers are leaders, willing to instigate change in the organization by first changing themselves, or if they are just management drones, interested only in maintaining the status quo.

Here is a classic example, originally from a Toyota training manual (though I pinched it from The Toyota Way):






































Level of Problem Action

There is a puddle of oil
on the shop floor.
Clean up the oil.
Why? Because the machine is leaking oil. Fix the machine.
Why? Because the gasket has deteriorated. Replace the gasket.
Why? Because we bought gaskets of inferior material. Change the gasket specifications.
Why? Because we got a good deal (price) on those gaskets. Change purchasing policies.
Why? Because the purchasing agent gets evaluated on short term savings. Change the evaluation policy for purchasing agents.


Seems easy, doesn't it? Now let's try it on a software development problem:






































Level of Problem Action

The code base smells. Refactor the code.
Why? Most developers did not know how to write good code, and the project managers pushed them to hurry up. Teach the developers Test-Driven Development, and the project managers about agile development methodology.
Why? The developers and project managers where hired from subcontractors, and nobody checked their level of competence. Change hiring practices, and develop a core in-house staff for software development.
Why? Management viewed developers and managers as interchangeable and easily replaceable resources. Teach management about knowledge workers.
Why? Management uses the Scientific Management paradigm. Teach management a better paradigm for software development (agile, lean, TOC).
Why? Management has never heard of, or studied, anything else. Set aside study time for managers. See to it that they practice what they learn.


Piece of cake, wasn't it?

Here's a suggestion: do your own Five-Why analysis of a problem you are facing, or have faced in the past. Publish it on you blog, or in a comment to this post. If you blog about it, tell me. I'll link to your post.

Tuesday, September 05, 2006

Going With the Flow

So, maybe it's not a full month. Then again, I'm not really back. I am writing this in an Internet cafe in Ho Chi Minh City, Vietnam.

If you want to study one-piece-flow, and want to see why moving small batches around is better than lugging large ones, HCMC is the perfect place. The streets here have more traffic than anything I've ever seen. There are very few traffic lights, and driving on the right side of the road is sort of a very loose agreement. Also, the most common vehicles are scooters and light motorcycles. There are 30 million scooters in Vietnam, and by the look of it, most of them drive by my hotel each day. (I'll upload some pictures when I get back.) One would expect the result to be total chaos and confusion. It isn't!

Actually, the traffic flows like nothing you've ever seen (unless you've been here, of course). There are two reasons for this. The first is that though the streets are packed with vehicles, each vehicle is small and manouverable. The second reason is that the drivers drive very softly. They look ahead, and when they see any tendency to congestion they just slow down, or speed up, a bit, and steers gently to one side. The traffic flow rarely stops. There are two main causes of congestion: cars and tourists. Neither is nimble enough to move with the flow of the traffic.

You might think this has got nothing to do with software development, but it does. Speaking in general terms, the streets of HCMC are a business system, and the vehicles are goal units. In Gothenburg, where I live (well, the city closest to where I live), there are traffic lights in every street corner. Those lights regulate the traffic flow by creating batches of vehicles. The result is that when there are a lot of cars in the streets, queues build up. In contrast, here in HCMC, the traffic moves all the time. There are comparatively few queues.

In Theory Of Constraints terminology, there is little build-up of inventory in the HCMC traffic system. Value is added all the time (i.e. the vehicles are moving closer to their destination all the time). In agile development terms, the HCMC traffic flow with small vehicles and no traffic lights corresponds to using small stories instead of large use cases, and short iterations instead of long ones.

What it comes down to, is that it is all about the flow, always.

Friday, August 18, 2006

Sunday, August 13, 2006

Systems Archetype: Shifting the Burden to the Intervenor

There is a special case of Shifting the Burden which I am especially familiar with because I have worked as a consultant for many years: Shifting the Burden to the Intervenor.

Description



The problem may occur when an organization decides to rely on outside competence to solve an internal problem. The problem may be solved. The catch is that the competence to deal with the internal problem now resides outside the organization. This makes the organization vulnerable if the problem occurs again.

"The organization" may be a company that outsources development work to a consultant, or outsources a part of their infrastructure to a subcontractor. Departments within an organization are also vulnerable. They may (not always voluntarily) let another part of the organization take over an important function, only to find that the other part of the organization is not able to process requests quickly (usually because of queues), may not fully understand the requests (development department vs. IT department), or may see some political advantage in not fulfilling a request.

Example


A company needs to get a new product to market fast. There may be an economic, or strategic need, or political pressure. The company outsources development to consultants, quite often from several different companies. The consultants are done on time (well, this part usually does not happen,) but the knowledge about how the system works resides in the heads of people from other companies.

When the company needs to maintain the application, or develop a new version, it does not have the capability to do so in an efficient manner. The people who worked on the original project has dispersed, and now work on other projects, for other customers. Assembling the original team, or even a fraction of it, is not possible.

The total cost, over a few years, of outsourcing the work may be far greater than keeping the necessary expertise on staff.

Early Warning Symptoms

Quite often, management is aware that outsourcing may cause long term problems. However, the short term gains are over-estimated, and the long term costs are under-estimated.

Any situation where outsourcing is considered should raise an alarm. That does not mean that outsourcing is always bad, just that the consequences should be explored thoroughly. One way to do that is with a Future Reality Tree (FRT).

As with Shifting the Burden, I would like to stress the importance of making appropriate (I.e. measuring the right thing is more important than measuring with great precision.) measurements once an outsourcing plan is under way. In addition to the measurements mentioned in Shifting the Burden, it is a good thing to keep a close eye on code quality. If the quality of the code drops, development costs, in current and future projects, will rise dramatically. I have seen many companies where management are not aware of the implications of poor code quality. Software projects at these companies are always extremely expensive, deadlines are often missed, and the return on investment of the projects is much less than it could be.

Shifting the Burden to the Intervenor is often built right into the organizational structure. The most common structure today is the functional organization. In functional organizations, work is divided according to functions. For example, there may be specialized departments for IT services, methodology, and so on. This causes a number of problems, but the one we are concerned with here, is the loss of competence in software develpment departments. I have worked in development projects where nobody knows how to set up a web server, or a database server, or other important pieces of equipment. What happens is that the project organization is forced to make a request to the appropriate department, and then they will have to wait until the request is processed. In this way, even a relatively simple job can stretch out several weeks, months, or may never get done at all.

Management Principles

Don't outsource your brains! While many managers in knowledge organizations know this, as a general principle, they are sometimes a bit confused about where the brainpower in their organization resides. They believe it is with them. Not so! Nor should it be. Knowledge organizations have most of their brain power at the base of the organizational structure, by the power of sheer numbers, if nothing else. (Besides, good managers want smart people to work for them. If I ever start a company again, I'm going to make damn sure I only hire people smarter than me.)

There is no "grunt work" in software development. Writing software is all about designing highly complex systems, and writing very detailed specifications describing those systems. The grunt work, building software according to design specifications, is something compilers and interpreters do, never humans. Every developer in a software development organization is a knowledge worker, must be regarded as such, must be expected to behave as such, and must be treated as such. (For more on this, I recommend you read the classic Peopleware, and the more recent (2001) Slack.)

Organize to optimize the value stream (flow organization), not to keep people busy (functional organization)! Both Lean and Theory Of Constraints have proved, both empirically and theoretically, that this is the best way to go.

Systems Archetype: Shifting the Burden

Description


Shifting the Burden is one of the more common systems archetypes. The problem occurs when a short term fix is applied without regards for possible side effects. The short term fix seems to work, but over time the side effects escalate. In many cases, the capability to apply a long term solution atrophy over time, making it very hard to correct the problem.

Example

A classic Shifting the Burden example in software development is when management tries to apply a brute force solution, like increasing the manpower of a project, or forcing developers to work an excessive amount of overtime, to compensate for low productivity, or poor planning.

The diagram above describes a situation where a project has productivity problems. In other words, it won't make the deadline. The obvious, but often flawed, solution is to add more people to the project.

Most projects measure progress in terms of hours worked. Unless you live on Mars, you are almost certainly familiar with the basic idea: if a job has been estimated to take 100 hours, and someone has worked on it for 80 hours, then there are 20 hours of work left to do. However, this simplistic notion of progress fails to take several important factors into account: the original estimate may be wrong; statistical fluctuations (see The Variance Trap) may reduce the productivity rate; time lost to task switching will be mistakenly counted as "progress"; developers may have to spend time on non-productive tasks, which will also be counted as progress, etc.

The upshot is, management often has no idea what happens to productivity when they increase manpower. Even worse, because the number of hours spent per week does go up when more people is added, and management assumes there is a direct relationship between number of workers and productivity rate, it is very hard for them to notice any side effects. All seems to be well.

Under the surface there may be a witches brew of problems. The first thing that happens when more people are added to a project is that productivity goes down. Think of it like this: the first day on the job, a developer will not be very productive. She will need help though, setting up the development environment, being shown around, taught administrative routines, etc. Someone else, usually several someones, will have to spend time helping her to settle in. Therefore, the net productivity of the project is reduced on the first day. Over time the new developer will learn the ropes, and will gradually become more productive. This does take time though. If the system under development is complex, or just messy, there may be several months, maybe a year, before the new developer is up to speed. Until then, the project will not get the productivity increase the management hoped for.

If too many developers have been added at once, there may be major coordination problems. What should all the new people do? The system under development may not have an infrastructure that allows very many people to work on it efficiently. Management usually finds ways to keep everyone busy anyway, in the interest of "cost efficiency". The result is predictable: the software becomes incredibly messy. Consequently, changes become very hard to implement, and the defect count skyrockets. Even if there was an initial improvement in productivity, it is not sustainable.

Early Warning Symptoms

In software development, an early warning symptom is when developers begin to complain of poor quality, high complexity, or that the software "isn't object oriented" (assuming it is supposed to be). Most developers won't complain. They don't care two figs worth whether the software is a mess, as long as they get paid every month. I'm sad to say, they often do not even know the difference between high and low quality code. The people who will complain are the nerds, the people for whom software development is an all-consuming interest. (You know, the oddballs management often wish they hadn't hired in the first place.)

In general, a situation where workers begin to worry, and management can't understand why, is a sign of Shifting the Burden.

Management Principle

Focus on fundamental solutions! The only reason ever to use a short term solution is to gain time necessary for implementing a fundamental solution.

Of course, the trick is to be able to distinguish fundamental and short term solutions from each other. This can be difficult, but effective tools for doing that, such as the TOC Thinking Tools, do exist. All management has to do is learn to use them, and apply them. (There are no miracle cures, of course. However, I do believe that effects have causes. Therefore, using methods for reasoning about cause and effect is very useful.)

Another thing management can do is to just listen. Keep ears and minds open. Very often there are people in the organization who will recognize a false fix for what it is, and who are able to predict the long term effects very accurately.

The problem is accerbated by measuring the wrong thing, as we in the software development business are especially prone to do. Measuring the wrong thing blinds us to what is really going on, and makes it hard to take effective action. Most managers, project managers, middle management, and upper management, focus on measuring effort (hours worked, lines of code, task completion time, etc.). Measuring effort does not work, because there is no direct cause-effect relationship between effort and productivity, only a correlation. (The correlation isn't quite what most managers believe. The Critical Chain theory of business process management shows that a system yielding a high return on investment by necessity has parts that work at less than full capacity. Conversely, a system where the parts all work at full capacity, is never yielding optimal return on investment.)

I have seen several cases where management did not understand that something was wrong until years after the symptoms first occurred. Even worse, because the measurements where wrong, it was very hard for them to see what to fix.

On the other hand, productivity can be measured directly, so it is a good thing to measure. There is a direct relationship between Work-In-Process (WIP, or Inventory) and return on investment, (the ROI equation), so that is also a good thing to measure. Know those things, and you will know what really happens in a project.

Saturday, July 15, 2006

Systems Thinking

Systems thinking is a theory that argues that social systems must be studied as a whole. It is impossible to predict the effects of the whole system by studying its parts in isolation. A "social system" can be anything from a family unit, to a system of interacting economic units (companies, countries, etc.), or even an ecological system.

Systems thinking is an important part of the theoretical foundation for agile software development methodologies. I got interested in it some years ago because I kept stumbling on references to the systems thinking bible, The Fifth Discipline, by Peter M. Senge, in books about agile methodologies. Somehow, there was always one more programming book to buy first, but then I spied a copy at my parents-in-law's house, and borrowed it. After a few pages, I was hooked. Since then, I have worked systems thinking into the way I work. I have found it to be a valuable tool when trying to figure out what is really happening in different situations. With systems thinking I can answer questions like "why is the project careening out of control?" and "what can be done about it?"

I won't go into details about systems theory. Instead I'll point you to some introductory material. What I will do over the next few months is to blog about Systems Archetypes. A Systems Archetype is a pattern describing a frequently occuring system configuration.

Systems Archetypes are the business world counterparts to design patterns in software development and architecture. A manager who knows her systems archetypes has a considerable advantage when creating business strategies.

Systems Archetype Diagrams

Systems Archetypes can be described using simple diagrams. The diagrams have four symbols:


From left to right, a Balancing Process is a process where a condition or an action causes a response that tends to slow or cancel out the initial action.

A Reinforcing Process is a process where actions cause a reinforcing loop that causes a snowballing effect.

An Influence Arrow simply indicates that one action or process has an influence on another part of the process.

A Delay is a time delay between cause and effect. Time delays are tricky, because they make it hard to see the relationship between an action and an effect. They also make it hard to judge how much of an action that is appropriate.

Let's diagram a simple process: adjusting the water temperature when taking a shower. This is a balancing process. If the water is too cold, you increase the flow of hot water until the temperature is right. If the water is to hot, you reduce the flow of hot water.

I have started by diagramming a situation where the water temperature is too cold. The response is of course to increase the flow of hot water (or reduce the flow of cold water, but we´ll ignore that in this simple model). The complicating factor in this simple system is the delay. The response when we increase the hot water flow is not immediate. The natural tendency when nothing happens at once is to turn up the hot water flow a bit more. If the delay is great, you might try turning it up again, or even decide that you may have turned the knob the wrong way, and reverse directions.

Adjusting the shower temperature is a specific case of the Balancing Process With Delay archetype. You can see this archetype everywhere where there is a long delay between action and reaction, for example in the job and stock markets, and when companies expand, or downsize, too much.

I hope this is enough of an introduction. I'll follow up with complete descriptions of Balancing Process With Delay and other systems archetypes.

Thursday, July 06, 2006

A Manager's Mind is a Strange Place

A while ago (in another company than the one I work for now) I worked for a company that did not believe in making things easy for their developers. Everyone had to work in an open landscape. The landscape was divided into rectangular cells. Developers sat at corner desks, so that developers in a cell faced away from each other. To make communication between developers even more difficult, people working on the same project were usually located in different cells.

A new and very complicated project began. We developers realized that the seating arrangements would never work. There was no way we could succeed if we could not talk to each other. We decided to ask our department manager for a project room of our own.

The department manager realized that we needed to work close by to have the slightest chance of success. "OK," he said, "you'll get the room, for this project, because it is an unusually difficult one, but of course you can't get one every time you need it."

Read the previous paragraph again. What do you think went on in the manager's head?

Tuesday, July 04, 2006

The Variance Trap Images and Animations are Back

A couple of people have emailed me about missing images and animations in my Variance Trap series of postings. The broken links were due to a reorganization of my web site. I have finally fixed the problem. You can view the postings here: Part 1, Part 2, Part 3, Part 4, Part 5.

Thursday, June 15, 2006

Forthcoming: Pro Ruby

Apress has listed a new Ruby book, Pro Ruby, among its forthcoming titles.

Pro Ruby shows how to combine Ruby and agile methodologies into a unified, powerful package.

I do have a special interest in this book, because I'm writing it. (Which is why I have blogged less than usual lately.)

I don't want to give anything away prematurely, but I can tell you two things about the book:
  1. It does not have a Hello World example
  2. There is a sample application, but it isn't an online shopping application
There will be some time before Pro Ruby hits the book stores. When it does, I hope you like it.

The Daily WTF

This site is too good to miss: http://thedailywtf.com/

Sunday, June 11, 2006

The Declaration of Interdependence

You know all about the Agile Manifesto, of course, but what is the Declaration of Interdependence?

It is the next logical step, the answer to the question how Agile principles can be extended to non-software projects, and management in general.

Alistair Cockburn has written about it on his site, and in an article for Better Software Magazine.

The declaration is a year and a half old, but the article was published in June. Do read the article. It does show how profoundly different an Agile business is from the ordinary kind. (Technically, "the ordinary kind" is a business based on the principles of Scientific Management and cost accounting.)

Just a little something to show that the blog is alive. Have been quite busy the past few weeks, but will make time for more blogging. (I am not sure if that is a threat or a promise.)

Friday, May 19, 2006

A Failure I Can Live With

I had some good news a couple of days ago, and I just can't resist toting my own horn a little.

A couple of years ago I was subcontracted to write a web based print formatting system. The system could automatically produce brochures, and used FrameMaker as a formatting engine.

I haven't counted this system as one of the great successes of my career. FrameMaker and Internet Information server did not play well together. (Neither did Apache and FrameMaker.) In the end, I solved the problem by writing a small web server of my own. Though the solution worked, my client was not happy with the design. He insisted he wanted the application to run under IIS, even though it was not technically possible. Fortunately my client's client did accept the solution.

Still, my client wasn't happy, so I wasn't happy. Earlier this week a friend of mine told me what became of my 'failed' system.

The system has worked very well for more than five years now. The customer estimates that it has saved about 97% of the costs that would have been incurred without it.

My friend mentioned an estimated cost of about SKR 50,000,000 if the system had not been in place, which would have made the cost of running the system about SKR 1,500,000, and the savings about SKR 48,500,000. As my friend put it, "talk about Return On Investment".

Suddenly I feel a lot better about the whole thing. That kind of failure I can live with.

Thursday, May 18, 2006

Bug Hunt

A friend and I spent an enjoyable afternoon debugging a Rails application yesterday. "Enjoyable" and "debugging" usually don't go together, but in this case they did.

My friend, who happens to be a much better programmer than I am, was stuck on a bug. His application had a form that worked correctly the first time it was used. The second time around, values filled in the first time reappeared in the form. In the end it turned out that an automatically generated find_xxx_by_id method returned the wrong data the second time it was called. (My friend is filing a bug report.)

Our excursion into bug country was enjoyable because we solved the problem as a team. It made it easier to keep focused. There was communication. We could bounce ideas off each other. I learned a few new things about ActiveRecord. We solved the problem, and the bug report may save other people some trouble in the future.

It was a good half days work.

Slow connection to www.henrikmartensson.org

The connection to www.henrikmartensson.org is incredibly slow at the moment. Don't know why. My site host and I will look into it.

Saturday, May 13, 2006

Ruby Design Pattern: Pollution Control

Avoid method name collisions by mixing in modules to an instance of an inner class.



class Outer
class Inner
end

def initialize(mixin)
@inner = Inner.new()
@inner.extend(mixin)
end
...
end


When extending objects with mixin modules, there is a risk of method name
collisions. Given the example in the Module Injection post, the execute_before and execute_after methods in the Strategy object might collide with method names in a rule module. Suppose the rule module looks like this:

module TransformationRules
def functiondef_before(element)
...
end

def functiondef_after(element)
...
end

def execute_before(element)
...
end

def execute_after(element)
...
end
end


When the dispatch mechanism in the Strategy object finds an execute element, it will dispatch the element to the wrong execute method. It will call the execute_before method defined in Strategy, not the one defined in the TransformationRules module.

Pollution Control reduces the risk of name collisions when extending classes with new functionality by extending an inner class. Thus, functionality is added to a class without affecting the interface of the class.

Related Patterns


Pollution Control is a very close relative of Strategy, Inversion of Control, Dependency Injection and Module Injection.

The thing that makes Pollution Control a pattern in its own right is the purpose: mixing in functionality encapsulated in a module without changing the interface of the client.

Ruby Design Pattern: Module Injection

Signal that objects of a class is designed to be extended via mixins by passing the mixin module via the constructor.


class ExtensibleClass
def initialize(mixin)
extend(mixin)
end
...
end

Ruby allows objects to be extended with new functionality at runtime. Any object can be extended in this manner by a call to the extend method. Some classes are designed to have their objects extended with new functionality.

Example


Consider a Strategy object that encapsulates a set of transformation rules for XML documents. The document is used by a treewalker. The treewalker walks the node tree representing the XML document. The Strategy class uses Module Injection:

strategy = Strategy.new(io, TransformationRules)
treewalker = TreeWalker.new(strategy)
treewalker.walk(xml_document)

The strategy object contains dispatch code that chooses which transformation rules to call depending on such things as node type (element, processing instruction, text, etc.) and node name.

The Strategy object also needs transformation rules. Keeping the transformation rules in a mixin module makes it possible to reuse the Strategy object with different rule sets.

Because the rules are kept in a module, it is possible to make the module up of several smaller mixins. In this example, TransformationRules can itself mix in several smaller rules modules. This is what the Strategy class might look like:

class Strategy

def initialize(io, rules)
@io = io
extend(rules)
end

def execute_before(node)
# Dispatch code
end

def execute_after(node)
# Dispatch code
end
end

And here is a sample treewalker. This one is for REXML:

module XmlUtil
class TreeWalker

def initialize(strategy)
@strategy = strategy
end

def walk(node)
@strategy.execute_before(node) if @strategy.respond_to? :execute_before
if node.instance_of?(REXML::Document)
walk(node.root)
elsif node.instance_of?(REXML::Element) then
node.children.each { |child|
walk(child)
}
end
@strategy.execute_after(node) if @strategy.respond_to? :execute_after
end
end
end

When to Use Module Injection


Use Module Injection when you want to signal clearly that an object must be extended with a mixin module.

You can also use it when you want to encapsulate a family of algorithms, similar to Strategy, but with the added flexibility of using modules.

Related Patterns


Dependency Injection, Inversion of Control, Strategy, Pollution Control.

Tuesday, May 09, 2006

Rails Author Wanted

My publisher (Apress) is looking for an author interested in writing an advanced book on Rails. They work with new authors all of the time, so prior writing experience isn't necessary. If you're interested send myeditor (Jason Gilmore) and email at jason@apress.com.

Wednesday, May 03, 2006

Integration Driven Development and Anatomies

I attended a talk by Lars Taxén about Integration Driven Development and Anatomies today. IDDA is a development methodology for large projects. "Large" means 1,000+ people.

IDDA was developed at Ericsson for use in telecommunications projects. One of the things I found interesting was that the anatomies part, a method for modelling complex systems, is quite similar to Domain Modelling. That is, both methods aim to create models that mirror real world objects or processes. Anatomies, like Domain Models, make use of bounded contexts to separate different parts of the system.

Some things were very different from what I am used to. For example, IDDA subsystems are implemented in the order they are activated when the system is used. That is very different from the Agile practice of implementing the most valuable functionality first.

For example, consider a document production system consisting of three subsystems: an XML authoring system, a Document Management System (DMS), and a printing system.

With IDDA, the first system to be built would be the authoring system, then the DMS, and finally the printing system. This is the order in which the subsystems are used. First an author writes something, then she stores it in the DMS, then it is printed.

With an Agile methodology, the most likely order is authoring system, printing system, DMS. The authoring system is built first, to enable authors to start using the system as soon as possible. Then the printing system would be built, because that makes it possible to print documents, even though there is no DMS yet. That makes it possible to get value from the system, even though the central piece is missing. The DMS is added last.

The situation may be a bit more complex, of course. For example, once the basic printing engine works, it may be most profitable to implement the basic DMS system, go back and finish the printing engine, and finally put the finishing touches on the DMS.

IDDA focuses on the dependencies between subsystems. For very large and complex systems, this may be a better strategy than focusing directly on business value. (I am not quite convinced, but I am certainly interested in learning more.) Culture and cooperation are other areas that are of particular interest. Ericsson's projects often involve people from all over the world. It is important that they can communicate over the barriers of culture, language, and physical distance.

IDDA hasn't been directly influenced by Agile methodologies, but Systems Thinking has played a part. This may to some extent explain the similarities between Agile methodologies and IDDA, even though they target very different projects.

Thursday, April 27, 2006

I'm On Rails

Finally, I have put www.henrikmartensson.org on Rails! I have rebuilt the site from the ground up with Ruby on Rails. It was a pleasure to do it.

The visible changes are small. There are a few more menu choices, including a link to this blog. The site also lists the most recent postings to this blog.

Behind the scenes, the site is now database driven. Articles, presentations and images are stored in a database and retrieved on the fly. This makes the site a lot easier for me to update and maintain. I hope this will encourage me to write more articles. We'll see what happens.

I had an unexpected problem with reading the Atom feed from this blog. I tried two different Atom modules, but neither could parse the feed properly. Both had trouble with Atom entries that contained more than one link. The problem was easily solved with REXML and XPath. The current implementation is a bit of a hack, but I'll improve it in the near future. I think there is a blog post just waiting to be written there.

Saturday, April 22, 2006

The Best Kind Of blog Post

Pascal Van Cauwenberghe at Thinking For A Change has made the kind of blog post I like the most: it's about me. Well almost, it is about the Variance Trap series and the Theory Of Constraints based software development simulations I have run.

His post also reminds me that I haven't quite finished the series yet. One more article to go. Coming real soon now...

Wednesday, April 19, 2006

Test::Unit::XML 0.1.5 Has Been Released

Paul Battley emailed me about a bug in Test::Unit::XML. Attribute strings containing entity references were reported as not being equal even if the strings were exactly alike. I fixed the bug, added an assert_xml_not_equal assertion for good measure, and made a new release.

The assert_xml_not_equal assertion is the inverse of assert_xml_equal. It is mostly a convenience method. That is, it was convenient for me to have it when I wrote a test that compared attribute values.

I haven't worked much on my open source projects lately. It is not that I have lost interest. It is just that I have been otherwise occupied. Thus I am doing defect-driven development at the moment. When someone finds a bug, I fix it. If I can make some small improvement without spending a lot of time on it, I do that too.

Thursday, April 13, 2006

Spolsky on How to Organize Software Companies

Joel Spolsky has written an interesting article about how to organize and run software companies. Here is a quote:
The command-hierarchy system of management has been tried, and it seemed to work for a while in the 1920s, competing against peddlers pushing carts, but it's not good enough for the 21st century. For software companies, you need to use a different model.
And here is another:
...if only 20% of your staff is programmers, and you can save 50% on salary by outsourcing programmers to India, well, how much of a competitive advantage are you really going to get out of that 10% savings?
I think you will find the article an interesting read.

Friday, April 07, 2006

The 80/20 Rule

At Google, they have an 80/20 rule. Developers spend 80% of their time on work that has been assigned to them, and 20% is their own time, to spend on projects that are just interesting. That is one day per week that a Google employee can sit thinking things through, dreaming new stuff up, and implementing it.

Does it work? AdWords was created this way, and it is a major revenue maker. Google employees seem pretty enthusiastic. Google does attribute part of its success to the system. It seems to work very well, at Google.

Of course, at Google they have a pretty open and enthusiastic attitude to new ideas. "When someone comes up with a new idea, the most common response is excitement and a brainstorming session" according to a post by Joe Beda at Google. According to Beda, managers at Google actively encourage employees to come up with new ideas.

It is something to think about.

Tuesday, April 04, 2006

The Danger of Success

Pascal Van Cauwenberghe has posted an interesting article on the danger of succeeding with Theory Of Constraints. He is quite right. If your part of a process has been the system constraint, and you elevate it, some other part of the process will be the new constraint. The people responsible for that part of the process may be rather unhappy with you.

A friend of mine took a job as a subcontractor. He finished his job ahead of time, with no defects at all in his work. His manager was not happy. My friend was accused of "idling", because he had finished ahead of schedule.

The idea that it was the manager who was slow in obtaining more work was of course never under consideration. Nor was it ever under discussion that if the contract had been anything other than a time-and-materials contract, finishing early would not have reduced the net profit.

Quite often the slowpokes protect each other without even realizing they are doing it. I once worked at a company where nothing had less than 6 months lead time. Set up a web server, 6 months. Set up a build system, 9 months. Start working on a project, at least 6 months of up front design.

I have never been good at shutting up, so I pointed out that 6 months of up front design not only slowed the project down (see my postings about iteration size), it also deprieved us of feedback, so it was impossible to verify that the design really worked. I was told by the project manager that having a 6 month design phase did not matter, because we were waiting for the web server anyway.

Of course, the IT department that took 6 months to do a 1 day job felt no pressure either. The project was just in the design phase anyway, so why would we need a web server?

If we had set up the web server ourselves, it still would not have solved our problems, because we did not have the build machine. Still, none of that mattered, for getting a development database set up also took 6 months.

There was one thing more that was 6 months: the time estimate for the project.

Ah, well. There is no business like the IT business.

Monday, March 27, 2006

Separation Of Concerns In Ruby

I once worked in a Java project where cross-cutting concerns where a big issue. One of the problems was logging. Error handling was another. Logging and error handling code was sprinkled throughout 15 MB source code. Fixing it was no picnic.

Today, I once again had reason to reflect on how to separate cross-cutting concerns. This time in Ruby. For example, if I have a Ruby class like this:


class TemperatureConverter
def celsius_to_fahrenheit(c)
9.0 / 5 * c + 32
end
def fahrenheit_to_celsius(f)
5.0 / 9 * (f - 32)
end
end


it would be very nice if I could add a cross-cutting concern without having to modify the original code. I would like make the methods in the class loggable by opening the class and declaring that I want logging, like this:

class TemperatureConverter
loggable_method :celsius_to_fahrenheit, :debug
loggable_method :fahrenheit_to_celsius, :debug
end

How do I do that? I can add loggable_method as an instance method to Object. Object is the parent class of all other classes. With Ruby, thought and action is one:

require 'logger'
class Object
def loggable_method(method_name, level)
self.class_eval(wrap_loggable_method(method_name, level))
end

private

def wrap_loggable_method(method_name, level)
new_method_name = "original_#{method_name.to_s}"
alias_method new_method_name, method_name
<<-"END_METHOD"
def #{method_name}(*args)
begin
result = #{new_method_name}(*args)
logger.#{level} {"Called #{method_name}"}
result
rescue logger.error {$!}
raise
end
end
END_METHOD
end

def logger()
@logger ||= Logger.new('errors.log')
end
end


loggable_method uses the method class_eval to evaluate a string in the context of self. The string is a method definition generated by wrap_loggable_method.

wrap_loggable_method first renames the original method, then generates a method definition that calls the original method, logs the call, and then returns the return value from the original method.

If there is an error, it is rescued, an error message is logged, and the error is raised again.

This is pretty neat, because what we have here is a method of adding any type of cross-cutting concerns to any method in any class, with very little work, and without modifying the original class.

The big deal here is that doing this in Ruby is no big deal. In Java, I would probably use a framework like Spring to do something like this. I would write an XML configuration file to wire everything together. This can quickly get complex, and I have to preplan for it, making sure that I can program against interfaces, so that Spring (or whatever) can generate proxy classes.

With Ruby, the whole thing is much simpler. I need no framework, there is no configuration file, and I can add cross-cutting concerns to any class, even if I have not written it myself.

In the end, it means Ruby has an edge in productivity here, not just in writing the application, but throughout the entire lifecycle. The benefits to maintenance can be several times more valuable than the productivity gains when creating the application.

Sunday, March 26, 2006

A Day In the Life

Noel Llopis who runs the Games from Within blog has posted an interesting article about a day in the life at High Moon Studios, the Agile games company were he works.

High Moon Studios uses Scrum and Extreme Programming. Judging from the article, they have got it down very well.

Wednesday, March 15, 2006

Unpredictability and Coupled Systems

Scott Klebe asked me a very interesting question in a comment to my previous posting: does my research show whether other planning techniques are effective in increasing predictability?

Here is my reply. The short version of the reply is: yes, I believe some methods are effective, but if there is a lot of unpredictability, it is better to just track what happens and then extrapolate from the results. The catch is that the variance that causes unpredictability does not necessarily arise from within the project. That makes the effects hard to predict, and difficult to control.

Expounding on this a bit further: in The Variance Trap series I have modeled projects as systems standing in isolation. This is a huge simplification. In real life, there is a lot of interaction between the project system, and other systems. A typical development team, consisting of a number of consultants (that's typical in my line of work) will be influenced by:

  • Groups within the teams own organisation, with their own agendas. This includes their own management. (There are some examples in my reply to Scott.)
  • Team members will have their own agendas.
  • If the team members come from different organizations, each one of those will influence the project. (They do it whether they want to or not. For example, if one team member comes from an organization that does not use automated tests and version management systems, that may have quite an impact on the overall performance of the team.)
  • The customer's organization is not homogenous. It consists of a lot of groups. each one of those may have their own agendas.
  • The world outside may influence the project. Have you ever had a project finish successfully (you think), and then dropped because a competing product has just hit the market? Legislation often affects a project. A legislative change can be the reason for starting a project, but it can also stop one. Cultural bias definitely affects projects. Our culture is steeped in the doctrines of Scientific Management and Cost Accounting, which is perhaps the most important reason why software projects fail so often. One reason why accountants are disliked within many companies, is that they often give very bad advice about how to run the company. (Upper management believes them, but nobody else does.) They do this because they base their recommendations on Cost Accounting. Cost Accounting is a mathematical model of a company that was proven not to work more than 20 years ago. Companies are still required by law to use it. (They may use other accounting methods internally, but many don't.)
There is a discipline called Systems Thinking that deal with how systems interact. After The variance Trap, that will probably be the next thing I focus on. I have about 15-20 article ideas...

Monday, March 13, 2006

TocSim available at RubyForge

I have just set up a TocSim home page at http://tocsim.rubyforge.org/, and made TocSim available on RubyForge. There is no gem package or tarball available yet, so anyone interested will have to download directly from the repository.

Monday, March 06, 2006

Loopy Learning

Sean P. Goggins has written an interesting blog entry about Orchestrating Enterprise Architecture. If you read it, spend a little extra time thinking of the idea of single and double loop learning. It is worth it.

Do you work in an organization where double loop learning is common? If so, why not drop me a line? I'd like to know more about how your organization works.

The Variance Trap, Part 5

This is the fifth part in an ongoing series of articles. You might wish to read part 1, part 2, part 3, and part 4 before reading this one.

Until now every simulation I have described has used an extremely simplified model of a development process. It is time to try a more accurate model, and see where that gets us.

Here is the model I am going to use:


(Click on the image to see a full scale version.)

The numbers in the boxes at each stage in the process indicate how many goal units that can be produced in a single day. To this number, I add a random variation of +/- 2 units to represent fluktuations in production capacity. (Judging from experience, this is a lot less than in many projects. I want to err on the side of caution.

50% of all goal units fail at Unit Test. The effort needed to fix a defect at unit test is 10% of the original effort. Integration tests have a 20% failure rate, and the effort to fix is 10% of original effort. System tests have a 10% failure rate, but since fixing a failed system test requires a bit of redesign, the effort to fix is 20%. Acceptance tests also have a 10% failure rate, but a failure means the customer did not like the functionality at all, so fixing it is a complete rework, at 100% of original effort.

The thing I am going to focus on this time around is batch sizes. Most software development methodologies focus on iterations as a way of steering a project. As we shall see, the size of iterations also has a great effect on development speed.

I am going to show two simulated projects that differ only in the size of their batches. The first project uses large iterations: Integration tests are made once every 30 days, system tests comprise 60 days of new functionality, and the release cycle is 60 days.

The second project runs integration tests every 5 days, system and acceptance tests every 10 days, and has a 30 day delivery cycle.



As you can see, the project with the smaller iterations is more than twice as fast as the project with the large iterations. How can that be? The projects process the same number of goal units, and the processing stages in the two projects have the same capacity. Is it the statistical fluctuations that cause the difference? No, not this time. I get similar results every time I run the simulation. The project with the smaller iterations is faster every time, so the iteration size must affect the velocity of the project.

To solve the mystery, let's look at the problem from the point of view of a goal unit. A goal unit will spend time being processed, i.e. being transformed from an idea into a set of requirements, from requirements to large scale design, from design to code, etc. It will also spend time in queues, waiting to be processed. After processing, it may spend time waiting, for example waiting for other units in the same batch to be processed, before it can move on to the next stage. In many processes, there may also be a significant move time, when a goal unit is moved from one stage to another. In software development processes, the move time is usually rather short.

A simple diagram (not to scale) showing how a goal unit spends time in the system looks like this:

To create a batch consisting of 10 goal units, a processing stage has to perform its task 10 times. This means the first goal unit in the batch will have to wait for the other 9 tasks to be processed, the second task will have to wait for 8 tasks to be processed, and so on. The 10th task won't have any waiting time. (On the other hand, it may have spent a lot of time in a queue before being processed, but we will leave that out for now.)

It should be clear that on average, a goal unit spends a lot more time waiting to be processed, than actually being processed.

Look at what happens if we halve the batch size:

The average wait time is reduced considerably. This is why the development process with the smaller batch sizes is so much faster.

In real projects, there are many factors that may obscure the benefits of using smaller batches. Here are a few:
  • Management does not know about the effects, and therefore never sees them.
  • Requirements are changed in mid-iteration, which disrupts the iterations and leads to build-up of half finished work. Sometimes this effect is so severe that a project never manages to deliver anything.
  • A process stage is blocked for some reason. If the project manager keeps track of hours worked instead of goal units processed, it is easy to miss this problem. It's like trying to measure the speed of a car by looking at the tachometer. The engine is running, but the wheels don't necessarily turn.
Nevertheless, minimizing batch sizes is a sound strategy. Provided that a project is reasonably well run, the effects of having shorter iterations and release cycles can be dramatic.

From a strategic point of view, is there anything more we can do? Yes, there is. One thing should be pretty obvious: the batches in this example are still pretty large. A project that runs integration tests once per week still uses pretty large batches. With a build machine, that time can be reduced to once per day, or perhaps once per hour. With automated tests, the unit and system test cycles can be as short, or shorter. In many projects, even acceptance tests can be automated, except for tests that have to do with the look and feel of the user interface.

In a project that uses automated tests, build machines, and scripts that automate the deployment process, it is entirely reasonable to cut iteration and release cycle length to one or two weeks. (Well, not always. If users have to install their software themselves, they may not appreciate having to do so every Monday morning.)

Another thing to note is that both processes described in this installment are unregulated. That is, all process stages run at full speed all the time. This is how most companies work, and it is how most project managers run their project. However, looking at the animation above, we can see that this leads to a build-up of quite a lot of unfinished work in the development process.

It should be possible to feed the process slower, without reducing the speed of the system. This would reduce the build-up of unfinished work. This would be an advantage, because goal units that have not yet entered the process can be changed or removed without incurring extra cost to the development process.

Finally, it might be worth studying the effects of spending more effort on testing. If we automate testing, this will reduce the throughput at the Coding stage, but it will also reduce the rework that has to be done. Will more testing increase or reduce the total throughput of the system?

Plenty of fodder for thought, and material for at least a couple more installments in this series of articles.

Thursday, March 02, 2006

Alive and Kicking

I have poured all my spare time in TocSim, the Theory Of Constraints simulator, lately. Things are going well, but I have to constantly fight creeping featuritis.

Declan, another of my projects, hasn't seen much development lately, but that will change. I am going to present it at XTECH 2006 in may. If you are going to the same conference, why not drop me a line.

Monday, February 20, 2006

The Variance Trap, Part 4

This installment of The Variance Trap compares two similar development process simulations. They differ mainly in the amount of variance in production capability in the process stages. As the animated diagram shows, there is a great deal of difference in productivity. (If you haven't read the earlier postings on this topic, you might find that reading part 1, part 2, and part 3 makes this posting easier to understand.)
The animation above shows the result of two development project simulations. As before, the simulation model is extremely simple, with no errors or feedback loops. To simulate variations in productivity the simulation system throws a die for each process stage, for each tick of the simulation system clock.

The yellow line represents a simulation with a six-sided die. The blue line represents a three-sided die, with two added to each die roll. (A computer has no problem rolling a three sided die. If you want to do it for real, use a six-sided die, count 1-2 as 1, 3-4 as 2, and 5-6 as 3.) Lets call the six sided die 1d6 and the other one 1d3+2. (If you have ever played a roleplaying game, you wont have a problem with this notation.)

The 1d6 has a range of 1-6, and an average roll of 3.5. The 1d3+2 has a range of 3-5, and an average roll of 4. as you can see, the 1d3+2 process is much faster than the 1d6 process. If you have read the previous parts of this monologue, this should come as no surprise. The 1d3+2 process has less variance than the 1d6 process. The flow is steadier, with less inventory build up during a simulation run.

The implication is that if we can reduce the statistical fluctuations in a software development process, we can increase the productivity.

Let's take stock of what we have learned so far:
  • Because of statistical fluctuations, a an unregulated development process will be slower than the slowest of the process steps. Therefore, it is impossible to accurately estimate the time required by adding together the time estimates for individual process steps. Even if the individual estimates are correct, the combined result won't be. (See Part 1 and Part 2)
  • We can make measurements and extrapolate the time required from the aggregated process. This allows us to make fairly accurate estimates relatively early on in the project. (Part 2)
  • The productivity will increase if the statistic fluctuations in the development process can be reduced. (Part 3)
It is time to set up a more accurate project simulation, and study the effects of different management strategies. Part 5 in this series uses a more accurate model of the development process, and explores the effects of changing the length of test, iteration, and release cycles.

Feed Me

I had forgotten to add a link to my Atom feed in the Kallokain page template. Thanks to Chris Headgate for pointing it out.

Saturday, February 18, 2006

The Variance Trap, Part 3

If you read the 1st and 2nd parts of The Variance Trap, you know how statistical fluctuations can slow a project down to a velocity far below the velocity of each of the stages in the process. I demonstrated the effects of statistical fluctuations with a simple game with a die, a set of bowls that represented stages in a project, and a lot of colored glass beads. (This is a slight variation of a classic example by Eli Goldratt.)

I got tired of rolling dice by hand, so I started writing a Theory Of Constraints based process simulation engine. It is still very much a prototype, but today I wrote a computerized version of the beads and bowls game. I also managed to coax a simple animation from the simulator. It shows what happens with work buffers at different stages during the course of the project.

The simulation is extremely simplified. For one thing, there are no feedback loops from the test stages (Init = Unit Test; Intgr = Integration Test; Sys = System Test; Acc = Acceptance Test). For another, the variance is caused by a six sided die that changes the production capacity of each of the stages. There is no variance in the size of goal units (Use Cases, Story Points, or Feature Descriptions, depending on your preferences). Still, the simulation suffices to show the effects of fluctuations on a process.

The simulation run used to create the animation needed 8 (7.5) iterations to process the goal units. The average throughput per iteration was about 25. At the end of iteration 5, there were 96 goal units locked up in the process, out of a total of 188 goal units. This is a quite hefty inventory buildup. Still, it can be quite hard to even see the problem until it is too late. In many projects, work buffers are not even monitored.

At the end of the first iteration there is only a slight buildup of inventory. Seemingly nothing to worry about.

If this had been a real project, and if we had estimeted the length of it by calculating the production capacity at each stage, we would probably not suspect that anything is amiss. It is like the old story about how you can boil a live frog by putting it in cold water, and increase the temperature slowly. The frog won't notice that anything is amiss.

The image to the left is from the end of iteration 6. As you can see, there is a lot of inventory in the system. Even though Coding is done, the project is far from finished.

By this time, the frog is boiling. If we didn't know about the effects of statistical fluctuations, we might easily be led to believe that there is a problem with integration test here. There is certainly a backlog of work, but the integration test stage itself has the same capacity as the other stages.

The last goal unit leaves the system mid way through iteration 8. If this had been a real project, there would be quite a lot of frustration. What can we do? The first thing to do is to realize that there are two problems:

The first problem is how to make more reliable predictions about projects. The second problem is how to improve the development process itself.

Simulations like the one I've written about here can be a great help. In part 2, I did show a simple method of making estimates by extrapolating from the known throughput rate of the whole system. This method of estimating projects can be deduced from the simulation model. (It is not the only solution to the problem.)

The model also shows us that it is possible to improve the capacity of the system without having to increase the capacity of the capacity of the parts. How? By reducing the fluktuations. Expect a new simulation run, with the same average capacity, but less variability in the process steps, in the near future.

Until next time: good afternoon, good evening, and good night.

Thursday, February 16, 2006

Mixin' Language and Culture

I've had some very positive feedback about the Extract Mixin entry awhile ago. Pat Eyler wrote a blog entry about it, and Chris Hedgate also found it useful.

In his blog, Pat asks the question whether Ruby will spawn a set of new patterns, or change the way we look at existing patterns. I firmly believe it will. It's not just Ruby though. Languages have different features, and they also emphasize different things, and have different cultures built around them.

There is a lot to learn from immersing oneself in different languages and cultures. What you learn in one place, can often be used in another. I was a Perl programmer before I began working with Java.

Perl taught me a lot about how to just get things done and that there may be more than one valid approach to solving a problem. This, I hope, fostered a spirit of openness. (Though it is really for others to say how that turned out.) It also gave me a different perspective on object oriented programming and design patterns than I got from Java. Perl also taught me the importance of unit testing, something I have brought with me ever since.

On the less technical side, the Perl community, and CPAN taught me a lot about the value of working together, and that even if you contribute only a little, you will get it back a thousand times. (Quite literally in the case of CPAN.)

One thing Java taught me is the value of a good IDE. I rarely leave home without Eclipse in my backpack (tucked away on the hard drive of my laptop, of course). Java also taught me some things about how to coordinate team efforts, and (by not having it) why it is a really good idea to have a standardized way to install libraries. I also learned not to use frameworks that have tightly coupled frameworks (anything with EJB in it), and to appreciate the power of simplicity (Spring and friends).

Java also taught me a lot about the importance of refactoring and many other development practises. Beyond that, the Java projects I worked in rekindled my interest in development methodology and management.

And now, it is Ruby. Ruby is a very nice language to work with. It allows you to cut to the chase and focus on solving a problem. The Ruby community is much like the language: friendly, open, and lazy (in the Perl sense, i.e. believing strongly in getting the maximum of value with the minimum of effort).

I like it. The Ruby community is a place where I can grow.

Saturday, February 11, 2006

The Variance Trap, Part 2

This morning I continued the variance fluctuation experiment I wrote about in my previous blog entry.

I am going to show more details now, because it is interesting to follow what happens closely. (Well, it is if you are a management nerd, like me.) Remember that our original estimate, based on the average roll of the die, was that we'd get a throughput of 35 beads in an iteration. (An iteration consists of 10 sequences of 8 die rolls.) That prediction failed. The average throughput was only 28.4.

The second try to predict the end of the project used another method. I used the average flow rate, measured over the first five iterations. This prediction indicated that 1.6 more iterations would be needed. 5+1.6, rounded up, is a total of 7.

Let's see how the flow based prediction holds up. Here is state of the system two sequences into iteration 6:
The first sequence had a throughput of 0, the second had a throughput of 2. I am not feeding the system any more beads, so we can expect the group of beads in the Analysis bowl to begin to thin out. It has. There was 26 beads there, but the last two sequences have reduced that to 18. The number of beads in the Unit Test bowl (the 5th one), has 12 beads, which is one more than at the end of iteration five.

After four sequences in iteration 6, and rolling a highly unlikely series of fours and fives, sequence 3 yielded 5 beads. Sequence four yielded only 1 though, so it evens out:
The distribution continues to be rather uneven, but since there are now groups of beads closer to the end of the process chain, we can expect to make good time.

At the end of iteration 6, the model looked like this:
There are now no beads at all in Analysis, Design, Code, and Unit Test. Of course there is a weakness in the model here, because none of the test stages have a feedback loop to earlier process stages. The resulting effect is that no test ever fails. If tests did fail, that would of course slow down the process.

This is the system four sequences into iteration seven:
Turns out the flow based prediction came fairly close, predicting the end of the project 6 secuences into iteration seven. however, I made that prediction pretty late in the project. What would the flow based predictions have been earlier on? Let's look at the average flow after each iteration, and use that to calculate how many iterations we need to move 188 beads:
  1. 21/1=21 ==> 188/21= 8.9
  2. (21+32)/2= 26.5 ==> 188/26.5= 7.1
  3. (21+32+28)/3= 27 ==> 188/27= 7.0
  4. (21+32+28+36)/4= 29.25 ==> 188/29.5= 6.4
  5. (21+32+28+36+25)/5= 28.4 ==> 188/28.4= 6.6
  6. (21+32+28+36+25+31)/6= 28.83 ==> 188/28.83= 6.6
If we had watched the flow rate, we would never have underestimated, and we would have had a pretty accurate estimate after iteration 3. This suggests that monitoring the flow rate of the complete system makes it possible to make more accurate predictions than we will get by making time estimates (remember 35 beads per iteration) for each stage in the process.

In other words, measuring flow is better than estimating time!

One thing to note is that the model used here was balanced, i.e. the capacity was the same at each stage. In reality that is not the case. such differences in capacity would make traditional time estimates even more unreliable. I'll look into that, and more sophisticated methods of calculating project durationin a future blog entry. First I'll write myself a little simulation software. I'm getting tired of rolling the die.

Friday, February 10, 2006

The Variance Trap

This posting is inspired by the matchsticks and bowls game in Eliyahu Goldratt's book The Goal.

It is hard to estimate projects. Everyone knows that. What everyone does not know, is that even if you get fairly accurate estimates for each stage in the development project, it is still possible, and quite probable, that a project will come in later than expected due to the effects of statistical variation. These effects can be proven mathematically, but it is easier, more fun, and more convincing, to do it with an experiment. Let's create a simple model of a project, and run a simulation of how variance affects development speed.



When I ran this experiment this morning, I used seven bowls that represent the stages of the project: Analysis, Design, Code, Unit Test, Integration Test, System Test, and Acceptance Test. I used colored glass beads to represent ideas that are transformed into customer valued software features in the development process.

To simulate the variance in production capabilities, I used a six sided die. The idea is to roll the die, count the eyes on the die, and move that many glass beads into the Analysis bowl. I then roll the die again, and move that number of glass beads to the Design bowl, roll again and move to the Code bowl, etc.

I can't move more beads than there is in a bowl, so if i rolled 3 the first time, and moved 3 beads into the Analysis bowl, I can't move more than 3 beads to the Design bowl, even if I roll 4, 5, or 6.

With a six sided die, the average roll is 3.5. If I roll for each of the bowls in sequence 10 times, one might expect to move about 35 glass beads from start to end. Let's call 10 such sequences one iteration.

After the first 10 sequences (80 die rolls), I had moved 21 beads through all my bowls. 21 is significantly less than 35, but on the other hand, it was the startup iteration. In the beginning the bowls are empty, so it takes awhile to get the process flowing.

On the second iteration I moved 32 beads through the process. A little bit below average, but it doesn't seem impossible to make an average of 35 over a few more iterations.

On the third iteration, I got 28 beads. A bit behind schedule, but if you were a project manager in this situation, you would probably not be unduly alarmed. (If your focus is on time reports, rather than completed features, you would probably not even know you are falling behind.)

This picture shows the bowls at the end of the third iteration:



Note how the beads have clustered together a bit in the Analysis bowl (top left). There are 3 beads in the Design bowl, nothing in the Code and Unit Test bowls, 1 bead in the Integration Test bowl, and 3 beads in the Acceptance Test bowl. One would have expected the beads to be a bit more evenly distributed, since this is a balanced system, where all parts have the same "production capacity".

The fourth iteration I moved 36 beads. The first time above average. If this was a software project, it would be taken as a sign that the work is getting up to speed. However, if you look at the picture below, you can see that there is still an uneven distribution of beads in the bowls:


There is still a cluster of beads in the Analysis bowl. There are 4 beads in the Unit Test bowl. It looks a bit funny with a group of beads in a bowl surrounded with empty bowls. It looks even more weird when you roll the die yourself, because then you can see that the beads tend to travel in waves.

At the start of each iteration, I moved the beads from the output end to the input end, so that I could reuse them. However, towards the end of the fifth iteration, I found that I was out of input beads. I had to go and borrow some extra beads from one of my wifes flower pots.

This is what the bowls looked like after five iterations. You can see that there is a new wave of beads working its way towards the end of the process:


The throughput on iteration five was 25 beads. At this point I had entered a total of 188 beads into the system. There were 46 beads in the bowls. 26 of those are in the Design bowl. The average throughput is 28.4 beads.

More than 1.6 iterations worth of beads are locked up in the system. If we had expected a throughput of 35 beads per iteration, it's well past the time to be concerned. 188 beads should not take more than 6 iterations, but it seems clear that won't happen.

I won't feed the system any more beads. Tomorrow, I'll just process the beads already there. Care to guess how long it will take to drain the system of beads and complete the project? Remember, there are 46 beads, and the average throughput rate is 28.4. Seems clear two iterations will do the job, and leave time to spare.

Well, we will see tomorrow...

See also the 2nd part of the Variance Trap entry.