Thursday, December 4, 2008

Cross - Cost of Carry versus Cost of Switching

By Brad Cross - 4 December 2008

In the technical balance sheet, the cost of carry and cost of switching are proxies for cash flow.

When you take on technical liabilities, you incur cash flow penalties in the form of ongoing interest payments, i.e. going slow.

When you pay down principal on your technical liabilities, you incur cash flow penalties in the form of payments against principal. However, these cash flow penalties are of a different nature: they come in the form of paying down principal in the short term (going slower right now) in exchange for paying less interest (going faster as the principal is paid down).

The notion of going faster or slower shows the connection between cash flows and time. The cost of ongoing interest payments is an incremental reduction in speed, whereas the cost of payments against principal is an investment of time in exchange for an increase in speed. Restated, there is a trade off between cash flow penalties now (paying the cost of switching) for decreased cash flow penalties in the future (reducing the cost of carry).

Before we look further at these terms, we need to first introduce another concept: volatility. Volatility is the extent of change of a component, including maintenance and new development. The more work there is to do in an area of the code that has high technical liability, the greater the cost to complete that work. You can determine the volatility of a component by looking at previous project trackers, defects logs, the raw size of the component, source control history, and backlog work items.

Cost of carry is the sustained cost of servicing your liabilities. It represents the decrease in speed and corresponding increase in time that it takes for developers to complete tasks due to technical debt. Volatility is a major component of cost of carry: the liabilities in a high volatility component are the liabilities with the highest cost of carry across your application.

Within a component, your cost of carry is the time spent on production issues, maintenence issues, adding new code to change or extend the system, as well as testing and debugging time. The cost of carry for an indebted component also includes costs from its effects on dependent components. In other words, some dependent components are more difficult to productionize, maintain, change, extend, test and debug becasue of their dependency on an indebted component.

Cost of switching is the cost associated with your mitigation strategy. When indebted code is an impediment to sustaining and evolving the functionality of your project, your mitigation strategy is how you go about transitioning away from indebted code in order to reduce or eliminate the impediments. This can entail completely replacing a component with another component, partially replacing a component, or simply small incremental refactorings that accumulate over time. Switching costs are impacted by the size and scope of the component you are replacing, the time you have to spend to find and evaluate commercial and open source alternatives, time spent on custom modifications to the replacement, and time spent on migrating infastructure - for instance, migrating or mapping user data after switching to a different persistence technology.

The effect of dependencies and encapsulation

Both switching costs and cost of carry are proportional to encapsulation and dependencies.

If a technical liability is poorly encapsulated, other components that depend on the indebted component also incur higher switching and carry costs. This is an extremely important concept to understand, and one of the reasons why encapsulation is one of the most important ideas in the history of computing. (If you do not understand or track the dependence structure of your project, you can use a tool such as jDepend or nDepend.)

It is also important to understand that the impact of dependencies on switching costs and cost of carry can be transitive. This means that poor encapsulation of one high-liability component can have profound impact across not only its direct dependents, but also transitive dependencies that are several steps from the indebted component in the dependency graph.

If a high liability component is not highly volatile, and the costs of switching are high, then making the effort to substitute the component is a low priority. However, dependencies can overrule this judgment because indebted components can impact your cash flows by trickling down into other components that are highly volatile.

A component may be replaceable with an open source counterpart, but if the use of the component to be replaced is not properly encapsulated, then the switching cost can be quite high. It is important to note that in this case the cost of switching is higher because poor encapsulation itself is a technical liability. Often you have to spend a lot of time paying down that debt before you can consider replacing a component.

In the Technical Liabilities article, we talked about benign versus malignant risk. Benign risks are well encapsulated risks in low volatility code. Malignant risks are poorly encapsulated risks, risks in high volatility code, or both. Clearly, the cost of carry on benign risks is far lower than the cost of carry on malignant risks.

Making trade-off decisions

You can evaluate different cost of carry vs. cost of switching scenarios by taking the present value of each cash flow stream for each scenario. You can then compare the net present value of the different scenarios and determine the highest-value course of action. Switching is attractive when the present value of the stream of future cash flows from cost of carry is higher than the present value of the stream of future cash flows from cost of switching.

The next article in this series will be a field guide for execution that is dedicated to details and examples of using this mental model to make trade off decisions.

About Brad Cross:  Brad is a programmer.

Wednesday, November 19, 2008

Pettit - States of IT Governance

By Ross Pettit - 19 November 2008

As we head into a period of economic uncertainty, one thing we can count on is that IT effectiveness will be called into question. This isn’t so much because IT has grown excessively large in recent years, but because of results: industry surveys still show that as many as 60% of IT projects fail outright or disappoint their sponsors. As we head into a downturn, executives may look to IT to create business efficiency and innovation, but they won’t do that until they have scrutinised spend, practices and controls of IT.

This makes the need for good IT governance more urgent.

Governance is a curious concept. Governance doesn’t create value, it reduces the likelihood of self-inflicted wounds. It’s taken for granted, and there isn’t a consistent means to show that it actually exists. It is conspicuous only when it is absent, as it is easier to identify lapses in governance than disasters averted. And we tend not to think of an organisation as having “great” governance, but of having “good” or “bad” governance.

This strongly suggests that “good” is the best that governance gets. It also suggests, as the nursery rhyme goes, that when it is bad, it is horrid.

Any critical examination of our governance practices will focus on what we do poorly (or not at all) more than on what we do well. But rather than cataloging shortcomings, it is better to start by characterising how we’re governing. This will give us an opportunity to not only assess our current state, but forecast the likely outcome of our governance initiatives.

To characterise how we govern, we need some definition of the different types or states of governance. To do that, we can categorize governance into one state of “good” and multiple states of "bad."

We'll start with the bad. The most benign case of “bad” governance is unaligned behaviour. There may be guiding principles, but they're not fully imbued in day-to-day decisions. Individual actions aren't necessarily malicious as much as they are uninformed, although they may be intentionally uninformed. Consider an engineer in a Formula1 team who makes a change to the car, but fails to take the steps necessary to certify that the change is within regulation. This omission may be negligent at best, or a "don't ask / don't tell" mentality at worst. This first category of “bad governance” is a breakdown of participation.

The next category is naivety. Consider the board of directors of a bank staffed by people with no banking background. They enjoyed outsized returns for many years but failed to challenge the underlying nature of those returns.1 By not adequately questioning – in fact, by not knowing the questions that need to be asked in the first place – the bank has unknowingly acquired tremendous exposure. This lapse in rigor ultimately leads to a destruction of value when markets turn. We see the same phenomenon in IT: hardware, software and services are subjected to a battery of well-intended but often lightweight and misguided selection criteria. Do we know that we're sourcing highly-capable professionals and not just bodies at keyboards? How will we know that the solution delivered will not be laden with high-cost-to-maintain technical debt? Naïve governance is a failure of leadership.

Worse than being naïve is placing complete faith in models. We have all kinds of elaborate models in business for everything from financial instruments to IT project plans. We also have extensive rule-based regulation that attempts to define and mandate behaviour. As a result, there is a tendency to place too much confidence in models. Consider the Airbus A380. No doubt the construction plan appeared to be very thorough when Airbus committed $12b to the program. During construction, a team in Germany and another team in France each completed sections of the aircraft. Unfortunately, while those sections of the aircraft were "done", the electrical systems couldn’t be connected. This created a rather large, expensive and completely unanticipated system integration project to rewire the aircraft in the middle of the A380 program.2 We see these same phenomenon in IT. We have detailed project plans that are surrogates for on-the-spot leadership, and we organise people in work silos. While initial project status reports are encouraging, system integration or quality problems seemingly appear out of nowhere late in development. Faith in models is an abrogation of leadership, as we look to models instead of competent leaders to guide behaviour toward results.

Finally, there is wanton neglect, or the absence of governance. It is not uncommon for organisations to make optimistic assumptions and follow through with little (if any) validation of performance. Especially at the high end of IT, managers may assume that because they pay top dollar, they must have the best talent, and therefore don’t need oversight. People will soon recognise the lack of accountability, and work devolves into a free-for-all. In the worst case, we end up with a corporate version of the United Nation’s oil for food program: there's lots of money going around, but only marginal results to show for it. Where there is wanton neglect of governance, there is a complete absence of leadership.

This brings us to a definition of good governance. The key characteristics in question are, of course, trust and competent leadership. Effective governance is a function of leadership that is engaged and competent to perform its duties, and trustworthy participation that reconciles actions with organisational expectation. Supporting this, governance must also be transparent: compliance can only be built-in when facts are visible, verifiable, easily collected and readily accessible to everybody. This means that even in a highly regulated environment, reaction can be swift because decisions can be effectively distributed. In IT this is especially important, because an IT professional – be it a developer, business analyst, QA analyst or project manager – constantly makes decisions, hundreds of times over the life of a project. Distributed responsibility enables rapid response, and it poses less of a compliance risk when there is a foundation of trust, competent leadership, and transparency.

This happy state isn’t a magical fantasy-land. This is achievable today by adhering to best practices, integrating metrics with continuous integration, using an Agile-oriented application lifecycle management process that enables localised decision-making, and applying a balanced scorecard. Good IT governance is in the realm of the possible, and there are examples of it today. It simply needs vision, discipline, and the will to execute.

In the coming months, we are likely to see new national and international regulatory agencies created. This, it is hoped, will provide greater stability and predictability of markets. But urgency for better governance doesn't guarantee that there will be effective governance, and regulation offers no solution if it is poorly implemented. The launch of new regulatory bodies - and the actions of the people who take on new regulatory roles - will offer IT a window into effective and ineffective characteristics of governance. By paying close attention to this, IT can get its house so that it can better withstand the fury of the coming economic storm. It will also allow IT leaders to emerge as business leaders who deliver operating efficiency, scalability and innovation at a time when it's needed the most.

1 See Hahn, Peter. “Blame the Bank Boards.” The Wall Street Journal, 26 November 2007.

2 See Michaels, Daniel. “Airbus, Amid Turmoil, Revives Troubled Plane” The Wall Street Journal, 15 October 2007.

About Ross Pettit: Ross has over 15 years' experience as a developer, project manager, and program manager working on enterprise applications. A former COO, Managing Director, and CTO, he also brings extensive experience managing distributed development operations and global consulting companies. His industry background includes investment and retail banking, insurance, manufacturing, distribution, media, utilities, market research and government. He has most recently consulted to global financial services and media companies on transformation programs, with an emphasis on metrics and measurement. Ross is a frequent speaker and active blogger on topics of IT management, governance and innovation. He is also the editor of

Thursday, November 13, 2008

Cross - Technical Equity

By Brad Cross - 13 November 2008

Recall that previously, we tallied code metrics by component to produce a table of technical liabilities. Then we scored the degree of substitutability of each component on a scale of 1 to 4 to assign each an asset value. To determine owner's equity, we need to compare asset and liability valuations, which means we need to transform the collection of metrics in the table of technical liabilities into an index comparable to our asset valuation.

Are you Equity Rich or Over-Leveraged?

There are two ways we can make this transformation. One is to devise a purely mathematical transformation to weigh all the technical scores into a single metric scaled from 1 to n. To do this, convert each metric to a 100% scale (where 100% is the good side of the scale), sum the scaled values and divide by 100. This will give you a number from 0 to n, with n being the number of metrics you have in your table. Then multiply by 4/n, which will transform your aggregated metric to a scale with a maxiumum of 4. For example, if you have 60% test coverage, 20% percent duplication and 150 bug warnings from static analysis, you can do the following: (1-60%) + 20% + 150/200 = 135% (where 200 is the maximum number of bug warnings in any of your components.) Since we have 3 metrics and we want the final result to be on a 4 point scale, we multiply by 4/3. This gives a score of 1.8 out of 4.

Alternatively, we can devise a 1-to-4 scale similar to our asset valuation scale. This allows us to combine quantitative metrics with the qualitative experience we have from working with a code base.

As in finance, excessive liabilities can lead to poor credit ratings, which leads to increasing interest rates on borrowing. In software, we can think of the burden of interest rate payments on technical debt as our cost of change, something that will reduce the speed of development.

Technical debt, like any form of debt, can be rated. Bond credit ratings are cool, so we will steal the idea. Consider four credit rating scores according to sustainability of debt burden, and how these apply to code:

  1. AAA - Credit risk almost zero.
  2. BBB - Medium safe investment.
  3. CCC - High likelihood of default or other business interruption.
  4. WTF - Bankruptcy or lasting inability to make payments.

A AAA rated component is in pretty good shape. Things may not be perfect, but there is nothing to worry about. There is a reasonable level of test coverage, the design is clean and the component's data is well encapsulated. There is a low risk of failure to service debt payments. Interest rates are low. The cost of change in this component is very low. Development can proceed at a fast pace.

A BBB component needs work, but is not scary. It is bad enough to warrant observation. Problems may arise that weaken the ability to service debt payments. Interest rates are a bit higher. This component is more costly to change, but not unmanageable. The pace of development is moderate.

A CCC is pretty scary code. Sloppy and convoluted design, duplication, low test coverage, poor flexibility and testability, high bug concentration, and poor encapsulation are hallmarks of CCC-rated code. The situation is expected to deterioriate, risk of interruption of debt payments is high and bankruptcy is a possibility. Interest rates are high. Changes in this component are lengthy, painful, and expensive.

A WTF component is the kind of code that makes you consider a new line of work. Bankruptcy-insolvency is the most likely scenario. Interest rates are astronomically high. An attempt to make a change in this component is sure to be a miserable, slow and expensive experience.

Expanding on the example we've been using, let's fill out the rest of the balance sheet and see what owner's equity looks like.












































































This table naturally leads to a discussion of tradeoffs such as rewriting versus refactoring. Components with negative equity and low asset asset value are candidates for replacement. Components with positive equity and middling asset value are not of much interest: while owning something of little value is neither exciting nor worrying, owning something of little value that carries a heavy debt burden is actually of negative utility. Components of high asset value but low equity are a big concern; these are the components we need to invest in.

In addition to thinking about how much equity we have in each component, we can also think about how leveraged each component is, i.e. how much of a given component's asset value is backed by equity, and how much is backed by liability. This measure of leverage is called the debt to equity ratio. An asset value of 3 with a liability of 2 leaves you with an equity value of 1, and you are leveraged 3-to-1, i.e. your asset value of 3 is backed by only 1 point of equity. Any asset with negative equity has infinite leverage, which indicates a severe debt burden.

The Technical Balance Sheet Applied

This exercise makes the decisions we face easier to make. In the example above, there are a number of components with an asset value of 2 and liabilities of 2 or 3. This led me to replace all the custom persistence code with a thin layer of my own design on top of db4o (an embeddable object database.) I deleted the components Brokers and DataProviders, then developed my own components from scratch and extracted new interfaces.

The FIX component, with an asset value of 1 and liabilities of 4, obviously needed to go. However, although the component has high negative equity, I did some experimenting and found that the cost of removing this component was actually quite high due to proliferation of trivial references to the component. I have gradually replaced references to the FIX component and chipped away at dependencies upon it, and it will soon be deleted entirely.

There are a number of components with an asset value of 3 or 4 but with liabilities of 2 or 3. These are the most valuable parts of the product that contain the core business code. However, some have 20% or less test coverage, loads of duplication, sloppy design, and many worrisome potential bugs. Due to the high asset value, these components warrant investment. I thought about rewriting them, but in these cases most often the best bet is to pay down the technical debt by incrementally refactoring the code. A subtle bonus from refactoring instead of rewriting is that each mistake in the code reveals something that doesn't work well, which is valuable information for future developement. When code is rewritten, these lessons are typically lost and mistakes are repeated.

We now have a sense of debt, asset value and technical ownership that we've accumulated in our code base. Our next step is to more fully understand trade off decisions based on weighing discounted cash flows: the cost of carry versus the cost of switching. Or alternatively stated, the cost of supporting the technical debt versus the cost of eliminating it.

Wednesday, November 12, 2008

Kehoe - Your Performance Is Our Business

By John Kehoe - 11 December 2008

I know of an outfit with the motto of ‘Your Performance is our Business.’ It doesn’t matter what they sell (OK, it’s not for an E.D. product, let’s get that snicker out of the way), it is what the mindset the motto instills: ‘You’ve got IT problems with a real impact to business, and we have a solution and are focused on your business needs’. It got me thinking, what mindset is at work at our IT shops? What would an IT shop look like with a performance mindset?

The meaning of the mindset varies by how we define performance to our customers. Some shops have a simple requirement. I can’t count the times I find the critical application is Email. It is a straightforward domain of expertise, albeit with significant configuration, security and archiving requirements. The key metrics for performance are availability, responsiveness and recoverability. Not an insurmountable challenge except in the most complex of deployments.

There isn’t much performance to provide for messaging outside of availability and responsiveness. There isn’t alpha to be found. It’s a cost center, an enablement tool for information workers.

Performance comes into play with alpha projects. These are the mission critical, money making monstrosities (OK, they aren’t all monstrosities, but the one’s I’m invited to see are monsters) being rolled out. This is very often the realm of enterprise message busses, ERP and CRM suites, n-tier applications and n-application applications (I like the term n-application applications, I made that one up. My mother is so proud). All are interconnected, cross connected or outside of your direct control. Today, the standard performance approach is enterprise system management (ESM). These are the large central alerting consoles with agents too numerous to count. ESM’s approach the problem from the building blocks and work their way up.

This is not the way to manage performance.

The problem with low level hardware and software monitoring approach is that performance data cannot be mapped up to the transaction levels (yes, I said it; consider it flame bait). Consider the conventional capacity planning centers we have today. The typical approach is to gather system and application statistics and apply a growth factor to it. The more sophisticated approach uses additional algorithms and perhaps takes sampled system load to extrapolate future performance. The capacity planning model lacks the perspective of the end user experience.

The need for Transaction Performance Management (TPM)

TPM looks at the responsiveness of discrete business transactions over a complex application landscape. Think of this in the context of n-application applications: TPM follows the flow regardless of the steps involved, the application owner or “cloud” (did I ever say how much I loathe the term cloud?) The TPM method follows the transaction, determines the wait states and then drills into the system and application layers. Think of it as the inverse of ESM.

To exploit the TPM model you need a dedicated performance team. This is a break from the traditional S.W.A.T. team approach for application triage and the mishmash of tools and data used to ‘solve’ the problem. In a previous column, we discussed the implications of that approach, namely long resolution time and tangible business costs.

What would a TPM team look like?

  1. We need a common analysis tool and data source monitoring the application(s) architecture. This is essential to eliminate the ‘Everybody’s data is accurate and shows no problems, yet the application is slow’ syndrome.
  2. We need to look at the wait states of the transactions. It isn’t possible to build up transactions models from low level metrics. The ‘bottom up’ tools that exist are complex to configure and rigid in the definition of transactions. They invariable rely on a complex amalgamation of multiple vendor tools in the ESM space. Cleanliness of data and transparency of transactions are hallmarks of a TPM strategy.
  3. We need historical perspective. Historical perspective provides our definition of ‘normal’. It drives a model based on transaction activity over a continuous time spectrum, not just a performance analysis at a single point in time.
  4. We need ownership and strong management. TPM implementations don’t exist in a vacuum, they cannot be given to a system administrator on an ‘other duties’ priority. TPM implementations require a lot of attention. They are not a simple profiler we turn on when applications go pear shaped. Systems, storage and headcount are needed, subject matter experts retained and architects available. Management must set the expectation that TPM is the endorsed methodology for performance measurement, performance troubleshooting and degradation prevention.
  5. TPM must quickly show value. Our team and implementation must be driven by tactical reality with a clear relationship to strategic value. We need to show quick performance wins and build momentum. For instance, there needs to be a focus on a handful of core applications, systematically deployed and showing value. Build success from this basis. Failure is guaranteed by the big bang global deployment or a hasty deployment on the crisis du jour.
  6. There must be a core user community. Our user community consists of people up and down the line, using the tool for tactical problem solving to high level business transaction monitoring. These people know the application or the costs to the business. Training is essential to an organization-wide TPM buy-in.
  7. TPM must be embraced by the whole application lifecycle. It’s not enough to have operations involved. QA, load testing and development must be active participants. They can improve the quality of software product early on using the TPM methodology.

An example of a TPM success is a web based CRM application I worked on recently. The global user population reported poor performance on all pages. The TPM team saw isolated performance issues on the backend systems, but they did not account for the overall application malaise. The supposition was that last mile was too slow or that the application was too fat. We implemented the last TPM module to analyze the ‘last mile’. It turns out each page has a single pixel GIF used for tracking. This GIF was missing and caused massive connect timeouts for each page. Put the image in the file system and the problem disappeared. This begs the question: how would we have found this problem without the TPM approach?

TPM and Alpha

This TPM model sounds great, but what is the practical impact? Has anyone done this and had any bottom-line result to show for it? Many shops have succeeded with the approach. Here are two examples.

In the first one, a major American bank (yes, one that still exists) wrote a real-time credit card fraud detection application. The application has to handle 20,000 TPS and provide split second fraud analysis. Before we implemented the TMP model, the application couldn’t scale beyond 250 TPS. Within weeks of implementing TPM, we scaled the application to 22,500 TPS. This was a $5mm development effort that prevented $50mm in credit card fraud. The TPM implementation was $100k and done within development.

The second is one of the top five global ERP deployments. The capacity planning approach forecast 800 CPU’s would be needed over two years to handle the growth in application usage. The prior year, a TPM center was established that performs ruthless application optimization solely based on end user response times and transaction performance. The planning data from the TPM center indicated the need for 300 CPU’s. The gun shy (hey, they’ve only been active a year) TPM center recommended 400 CPU’s. The reality was that only 250 CPU’s were needed when the real world performance of the application was measured and tuned over the subsequent year. The company saved $25mm in software and hardware upgrades. The savings provided the funds for a new alpha-driving CRM system that helped close $125mm in additional revenues over two years.

What is the cost of implementing a TPM center at a large operation? A typical outlay is $2mm including hardware, software, training, consulting and maintenance. An FTE is required as well as time commitments from application architects and application owners. Altogether we can have a rock solid TPM center for the price of a single server: the very same server we think we need to buy to make that one bad application perform as it should.

About John Kehoe: John is a performance technologist plying his dark craft since the early nineties. John has a penchant for parenthetical editorializing, puns and mixed metaphors (sorry). You can reach John at

Thursday, November 6, 2008

Cross - Discovering Assets and Valuing Intangibles

By Brad Cross 6 November 2008

In the last article in this series, we identified our technical liabilities by component, with each component representing a functional area. Now we will do the same with the asset side of the balance sheet.

The way a code base is structured determines the way it can be valued as an asset. If a code base matches up well to a business domain, chunks of code can be mapped to the revenue streams that they support. This is easier with software products than supporting infrastructure. Sometimes it is extremely difficult to put together a straightforward chain of logic to map from revenue to chunks of the code base. In situations like this, where mapping is not so obvious, you have to be a bit more creative. One possibility is to consider valuing it as an intangible asset, following practices of goodwill accounting.

An even more complicated scenario is a monolithic design with excessive interdependence. In this case, it is very difficult to even consider an asset valuation breakdown, since you cannot reliably map from functional value to isolated software components. This situation is exemplified by monolithic projects that become unwieldy and undergo a re-write. This is a case where bad design and lack of encapsulation hurt flexibility. Without a well encapsulated component-based design, you can only analyze value at the level of the entire project.

Substitutability as a Proxy for Asset Valuation

One way to think about asset valuation is substitutability. The notion of code substitutability is influenced by Michael Porter's Five Forces model, and is similar to the economic concept of a substitute good. Think of code components in a distribution according to the degree to which they are cheap or expensive to replace. "Cheap to replace" components are those that are replaceable by open source or commercial components, where customization is minimal. "Expensive to replace" components are those that are only replaceable through custom development, i.e. refactoring or rewriting.

Substitutability of code gives us four levels of asset valuation that can be rank ordered:

  1. Deletable (lowest valuation / cheapest to replace)
  2. Replaceable
  3. Valuable
  4. Critical (highest valuation / most expensive to replace)

Consider a collection of software modules and their corresponding valuation:

ModuleSubstitutibility ModuleSubstitutibility




























A critical component is top level domain logic. In the example table, only the trading logic component is critical. This component represents the investment strategies themselves - supporting the research, development, and execution of trading strategies is the purpose of the software.

A valuable component exists to support the top level domain logic. Here, there is a trading component that supports the trading logic by providing building blocks for developing investment strategies. There is also a simulation component that is the engine for simulating the investment strategies using historical data or using Monte Carlo methods. You can also see a handful of other components that provide specialized support to the mission of research, development and execution of investment strategies.

A replaceable component is useful, but it is infrastructure that can be replaced with open source or commercial component. Things in this category include homegrown components that could be replaced by other open source or commercial tools. Components in this category may do more than off-the-shelf alternatives, but an off-the-shelf replacement can be easily modified to meet requirements. In the example above you can see four components in this category. They are related to APIs, brokers, or the persistence layer of the software. Both broker APIs and persistence are replaceable by a wide variety of different alternatives.

A deletable component is a component from which little or no functionality is used, allowing us to either delete the entire thing, or extract a very small part and then delete the rest. This includes the case when you "might need something similar" but the current implementation is useless. In the example, there is an entire component, “Providers,” which is entirely useless.

Accepting Substitution

It is important to consider psychological factors when making estimates of value. Emotionally clinging to work we have done in the past will derail this process. For example, perhaps I have written a number of custom components of which I am very proud. Perhaps I think that work is a technical masterpiece, and over time I may have convinced others that it provides a genuine business edge. If we can't put our biases aside, we talk ourselves into investing time into code of low value, and we miss the opportunity to leverage off-the-shelf or open-source alternatives that can solve the same problem in a fraction of the time.

In this article we've defined a way to identify our assets by business function and to value them either based on a relationship to revenue streams or by proxy using their degree of substitutability. In the next installment, we’ll take a look at switching costs and volatility to set the stage for thinking in terms of cash flows when making trade-off decisions in our code.

About Brad Cross: Brad is a programmer.

Wednesday, October 29, 2008

Breidenbach - Grain and ICBMs

By Kevin E. Breidenbach - 29 October 2008

As you drive around the countryside of the Midwestern United States, you'll see silos towering over farmland. Depending on the time of year they may even be filled with grain. If you know where to look you might even be able to find a subterranean silo which housed one or more Minuteman ICBMs from the days of the cold war. In either of these cases the silos are used for a good purpose – to keep things in that either feed people or vaporize them!

For some reason though, there is a group out there that like to fill silos full of people. Even stranger is the fact that these people all work together but they may be put in different silos. It's really difficult to talk across silos, so nobody really knows what the people in the other silos are doing.

So who makes up this weird group that like to segregate people? After studying this group in the wild, researchers have named them Technicilli Managerius, or Technical Managers to us common folk.

In all seriousness this has been a problem for a long time. As an example, consider a large application that has many different major components. When the system is being designed, the managers often see benefits in creating individual teams to work independently on each component, with each team having its own manager, development leader and several developers. The teams start building their components and everything is dandy, but apart from the odd conversation over a beer between different team members, communication between silos is minimal. And this is where the problems begin.

If one team is having problems, it is unlikely that other teams will know about it. Different teams may very well be struggling with the same problem and coming up with different – even incompatible – solutions. Or they can play a game where each team waits for another to signal “defeat” and admit to being behind schedule. This defaults the entire application team into delay, giving everybody more time to complete their work – or simply another chance to play the game.

This inter-team disconnect is particularly prevalent in waterfall projects, which are far from transparent. Of course, the problem isn’t unique to waterfall; it can happen to any project. Agile teams have metrics such as burn down and velocity that allow us to see where they are, and across a large program of work this gives very good insight into which teams are behind schedule. But visibility doesn’t address the underlying problems of teams working in silos: multiple teams will still be working independently of each other.

I believe that fluidity of people across teams would improve communications and benefits the project as a whole. How would this work? Each major component has its own iteration manager and possibly a lead developer (although I'm becoming convinced that there is really no need for lead developers and a flat structure really is better). A developer can, when needed, move among component teams. In this structure, the "team" exists at the application level and not at the component level.

It can be argued that this fluidity is difficult to manage, and that developer ramp-up on each component is time consuming, both of which will impact timelines. These are valid points, but they don’t make a compelling counter-argument to trying this approach. If you have competent developers on your team, if there is nothing drastically different in the design of each component, and if there aren’t very different languages for each component, then developers moving among components shouldn’t be too much of a problem. Of course, not every developer may want to move between components, and they shouldn't be forced to make a move. But most developers are inquisitive enough to want to see what is going on in different components.

Another way to combat silos is through consistent, highly focused communications within and across teams. For example, a good development team holds daily stand-up meetings. They’re a great way to concisely communicate what people are doing and the problems they’re facing. While it isn't possible to bring a very large application team together for a stand-up, there is nothing to stop iteration managers from having regular stand-up meetings to inform each other of what their teams are doing and what problems they are facing.

Leave silos to grain and thermonuclear weapons: they are much better suited to that working environment than people. Development teams just get tunnel vision and projects end up badly disabled and sometimes dead. Plus, silos look terribly uncomfortable!

About Kevin Breidenbach: Kevin has a BSc in computer science and over 15 years of development experience.  He has worked primarily in finance but has taken a few brief expeditions into .com and product development.  Having professionally worked with assembly languages, C++, Java and .Net he's now concentrating on dynamic languages such as Ruby and functional languages like Erlang and F#.  His agile experience began about 4 years ago.  Since that time, he has a serious allergic reaction to waterfall and CMM.

Wednesday, October 22, 2008

Pettit - Volatility and Risk of IT Projects

by Ross Pettit - 22 October 2008

When we plan IT projects, we forecast what will be done, when we expect it will be done, and how much it will cost. While we may project a range of dates and costs, rarely do we critically assess a project’s volatility.

Volatility is a measure of the distribution of returns. The greater the volatility, the greater the spread of returns, the lower the expected value of the investment.

In finance, volatility is expressed in terms of price. In IT projects, we can express volatility in terms of time, specifically the length of time that a project may take. Time is a good proxy for price: the later an IT asset is delivered, the later business benefit will be realised, the lower the yield of the investment.

Using Monte Carlo simulation we can create a probability distribution of when we should expect a project to be completed. DeMarco and Lister’s Riskology tool calculates a probability curve of future delivery dates. It allows us to assess the impact of a number of factors such as staff turnover, scope change, and productivity on how long it will take to deliver. In addition to projecting the probability of completion dates, the Riskology tool will also calculate the probability of project cancellation. The inclusion of fat tail losses makes this a reasonably thorough analysis.

Monte Carlo simulation gives us both an average number of days to complete a project and a variance. Using these, we can calculate the coefficient of variation to get an indicator of project variability.

Standard Deviation (days)
CV = ------------------------------------------------

Average (days) to complete the project

The coefficient of variation is a metric of project variance. The higher the CV, the greater the volatility, the greater the risk there is with the investment.

Historical project data allows us to put a project's CV in context. Suppose we have metrics on prior projects that show the projects IT does for Accounting have an average variance of 1 day late for every 10 days of plan, with a variance of 0.5 days. Suppose further that a new project expected to last 6 months is forecast to be 6.5 days late with a variance of 3 days. Somewhat akin to assessing the Beta (although it is not calculated entirely from historical data), this comparison allows the IT project investor to ask piercing questions. What makes us think this project will be different from others we have done? Why do we think the risk we face today is so much different than it has been in the past?

It is appealing to think that by supplying some indication of the volatility and Beta we’re giving investors in IT projects a complete picture, but there are limits to what this tells us. Our risk assessment is only as good as our model, and our model will be laden with assumptions about people, capabilities, requirements and opportunity. What if we fail to completely identify and properly anticipate both probability and impact of the risks that we are so carefully modeling? Come to think of it, isn’t this what got Wall Street into so much trouble recently?

Risk management is a primitive practice in IT. All too often, it is little more than an issue list with high/medium/low impact ratings. Typically, risks are neither quantified nor stated in terms of impact to returns. Worse still, because they’re simply lists in spreadsheets, they’re often ignored or denied. This means that the IT project manager is focused on internal risks, such as technology challenges or days lost due to illness. They don't pay much attention to external risk factors such as a regionally strong tech economy that threatens to lure away top talent, or a supplier teetering on the edge of receivership. By quantifying the impact of risks – both individually and collectively – the project manager comes face-to-face with the broader environment on a regular basis.

Focusing on volatility also makes it obvious that the “on time and on budget” goal is achieved not so much by what goes right as much as by what doesn’t go wrong. That, in turn, strongly suggests that being "on time and on budget" is a misdirected goal in the first place. It also suggests that traditional project management has less to do with project success than we might think.

Let’s consider why. Traditional IT manages to the expectation that everything can and should go according to plan. Changes in scope, changes in staff, and even mistakes in execution are treated as exceptions. Traditional IT management looks to reconcile all variations to plan and will attempt to do so even when the sum of the variations – 100% staff turnover, 100% requirements change – creates a completely new project. The traditional IT project manager is trying to maximise effort for available budget: “the project will be on time, what might go wrong to make it late, and what must we do to compensate.”

By comparison, Agile project management expects change. It focuses on reducing volatility by making deliveries early and often. The Agile project manager is trying to maximise yield given available time: “the project will be late, what can we do to bring it forward, and what can we do to reprioritise.” This is a subtle - but critical - shift in thinking that moves risks from the fringe to the center of the frame.

Risk analyses are supplements to, not replacements of, good management. Risk models are abstractions. No risk model will capture every last exposure. Assumptions in underlying numbers will be wrong or arrived at inexpertly. Worse still, they can be gamed by participants to tell a particular story. All this can lead to poor investment decisions and ultimately investment failure. Still, while working in blind adherence to risk models is hubris, working without them is reckless. Investment managers (or in IT, project managers) are paid for their professional judgment. Investors (IT project sponsors) are responsible to audit and challenge those entrusted with their capital. What managers and sponsors do pre-emptively to ensure project success is far more complete when risk analyses are brought to bear.

The higher the expected yield of an IT project, the more important it is to be aware of volatility. If volatility isn't quantified, the project lacks transparency, and its investors have no means by which to assess risk. This leads to a misplaced confidence in IT projects because long-term success is assumed. If the recent financial turmoil has taught us anything, it's that we can assume nothing about our investments.

Wednesday, October 15, 2008

Kehoe - The German General Staff Model and IT Organizations

By John Kehoe - 15 October 2008

A couple of stories caught my eye this month and dovetail with a model I’m investigating. First, a column, Meet the IT Guy, outlines the typical, hackneyed view of the IT archetype (still funny stuff). The second, an excerpt from Numerati in Business Week, examines IBM’s effort to model consultant abilities and cost to map the right people to the right job and model how a successful expert is created (neat for high level experts, but a bit scary for lower level consultants).

This leads me to the characteristics an IT organization needs to excel. Curiously enough, they are descendents of the German General Staff (GGS) from the period between 1814 and 1900).

Staffing Model

The German General Staff (GGS) was created at the end of the Napoleonic Wars as a reaction to the military officer corps being staffed by men who purchased their positions. The "commissioned" officers (as in, paid for their commission) were on a quest for personal glory. As a result, they fought by rote, did not create new strategies, got a lot of men killed and wasted resources. The GGS mission was to create a professional officer corps to change that mindset and with it, achieve better results.

Candidates for the GGS were rigorously vetted for competency and motivation before an invitation was extended. Once in, GGS officers were categorized in two dimensions: motivation and competency. This can be represented in a 2 x 2:

This is how the GGS staffed the right person for the right role.

Clearly, in any organization, a mismatch breaks the lot. Play the mental exercise of putting your people in these positions. It is easy to find an implementer in a manager position or a manager in the general’s seat. Recognizing the mismatch makes obvious why things bog down or spin out of control.

Organizational Values

The GGS instilled the following values in its members:

  • Dedication: By investing in an individual, that individual is more committed to the organization
  • Motivation: Devise good strategies that are efficient, flexible and victorious
  • Dissemination: Get the ideas into the field
  • Doctrine: Get a common process in play
  • Innovation: Think outside the box to adjust to the circumstances
  • Philosophy: How do we go about our business? What are the ways and means we chose to use or not use.

Now, why put forward the GGS as a role model? Because the Germans where the first to do it. Today, every major nation has some sort of joint military command and training structure. In order for a military to succeed in a campaign, it must leverage every resource to maximum productivity and align tactical activities with strategic goals. The most successful operations – the most successful businesses – have the concept ingrained from top to bottom.

This approach makes clear how important it is to place each person to his or her abilities. Suppose a junior level person is designing a technical architecture for performance management. This mismatch is a high risk for the organization and the individual. Don’t axe the person because they perform poorly, move him or her to a task commensurate with their ability. Make clear that this move is not a punishment, just a rebalancing of the skills portfolio. If, however, a person falls into the lazy/incompetent quadrant, well, you know what to do.

Another interesting characteristic of the GGS is that it rotated people off the line into staff positions, and then back to the line. Many IT organizations have one set of people who put code into production and a cadre of architects in staff (or non-line) positions. Or they have people managing projects in their own way, ignoring efforts of a central PMO to create consistent and professional PM practices. Either is an example of an organizational separation within IT, with “central office” people on one side and “executors” on the other. By rotating people in and out of staff positions, IT policies are more likely to be actionable and not academic. There is also more likely to be buy-in and not resistance to IT standards. Perhaps less obvious, it contains the expansion of IT overhead as there are few career "staffers:" everybody will be back to the line before long. Finally, it makes IT less of a cowboy practice, and more of a professionally executing capability.

Mapping the Concepts

How do we map the GGS to an organization to measure its potential?

  • Dedication is obvious by the rate of employee exit and replacement. IT is a highly mobile profession; "healthy" IT organizations will have 15% turnover or less.
  • Motivation is recognized by the creation of value, not the number of overtime hours worked. Do we deliver in a timely fashion? Are we receptive to other organizations? Do we know and appreciate the impact of our action and inaction?
  • Dissemination is judged by joint cooperation. Do tasks get done in a timely fashion across a joint team? Do people know the resources and responsibilities around them?
  • Doctrine can be summed up simply by asking: do rules define the exceptions or do exceptions define the rules? If our rules are exception based, we have a problem. We have multiple options and cannot rationally consider their impact.
  • Innovation is the game changer that gets us ahead of the competition. What can we change that drives revenue or improves margins?
  • Philosophy defines the actions we accept and reject. What motivations do we accept or reject? Are they telegraphed throughout the organization? For instance, we should have profitability as our guide. How to we achieve it, growth or cost cutting. Does advantage come from outsourcing or offshoring? How do we balance the other 5 values?

The GGS was not a rarified career opportunity devoid of delivery expectations and obligations, but provided a means by which to circulate expertise and provide experience. It offers IT a straightforward way to fashion planning and career development, as well as a means to incubate ideas and individuals. It starts with a clean sheet of paper, a cup of coffee and insight into your business, organization and services portfolio. Give it a try today.

About John Kehoe: John is a performance technologist plying his dark craft since the early nineties. John has a penchant for parenthetical editorializing, puns and mixed metaphors (sorry). You can reach John at

Wednesday, October 8, 2008

Ververs - Generation Y: Enabling Autostart

By Carl Ververs - 8 October 2008

A pack of Millennials sit in a conference room. The phone rings. Says one to the next: “Dude, you gonna get that?”, Says the other: “Like, Yah! And get, like reprimanded, or something!”. Five minutes later their manager storms in, yelling: “Why aren’t you picking up the phone??!?! I was wondering where you were!” Says one in reply: “We didn’t know we were supposed to!”

This would be cute if it were just a joke. But this is the life of managers of Millennials.

When compared to previous generations, Generation-Y comes off as a bunch of lazy, indecisive airheads. But there is more to it than that. And after a bit of closer examination, you will find that the Gen-Y attitude has some real advantages.

Millennials didn’t jump on their Facebooks and collectively decide to give every prior generation a hard time. Their work attitudes have a clear origin: the overzealous, results-driven approach to child rearing that their parents have practiced. Like many things in their lives, Late Boomers and Gen-X couldn’t just enjoy their children, they were to be made projects, with a results baseline and earned-value metrics. Consider a recent remark overheard from a mother of a 5-year old: “Jason still can’t write a full sentence. I’m considering occupational therapy.”

The cause for excessive parent involvement may be rooted in their investment approach to raising their children. In a Chicago Tribune article from September 9th, 2008, a helicopter parent justifies her behavior by saying that she feels parents must stay involved in their children’s life in college to make the $200K investment worthwhile. This cynical perspective will not be lost on the “assets” and will no doubt make them feel reduced to Boomer accessories (that is, if the dual-career-hound, passed-off-to-sitters pattern hasn’t done so already). There is, not surprisingly, an entrenched defense for helicopter parenting, as a CPA article from September 2006 shows.

As involved and engaged as these parents have been with their offspring, they have missed something crucial: their children can actually think for themselves and have drawn the conclusion that, like all generations before them, their parents are wrong. They have revolted against the workaholic, asset-oholic, keeping-up-with-the-Gateses attitude and are simply rejecting the now-bankrupt notion that personal success is measured by how much stuff one has. "Turn on, tune in, drop out" has been replaced with "iTune in, Log on, Check out your friends’ Twitter updates."

Millennials demand a better work/life balance. There are several advantages to this. For starters, they are less likely to burn out. They have to be very efficient to complete the work assigned to them. They tend to get auxiliary work input from multiple communications channels. Rather than working long hours, they challenge the status quo, demanding (as they should) solutions to lingering problems that create the need for excessive work hours.

However, an inflexible attitude to working late can make them unreliable in crisis situations. Evening and weekend work is sometimes necessary. A Gen-Y person in a key role unwilling to put in the effort when it is needed most can mean that deadline-driven work languishes and projects are delayed. Striking a compromise is difficult. A developer on one of my teams had no higher priority than running. He was consistently late with his work, which was often also incomplete. We agreed to small, very concrete deliverables. While they were completed properly, the experience left me feeling like I was micromanaging and impeding his growth.

While Millennials want to have a life, they also want to have a meaningful job. They want their ideas to be heard and their opinions to count. The upshot of this is that bright minds will not go to waste. Staff can be motivated by big ideas and by seeing clearly how they contribute to revenue. But youthful enthusiasm often gets dismissed as daydreaming, discredentializing fresh insights. Millennials must be taught that most people don’t “get” new ideas at first. Promoting and realizing them requires discipline and tenacity.

A bigger problem is the sense of entitlement Gen-Y, and to a greater extent Gen-I, feels. They expect everything to be on-demand and immediate. If something requires application, study and practice, it’s dismissed, considered to be not worth the effort. This puts Gen-Y at a disadvantage with people from other cultures who have not followed the Western patterns of generations. In the West, the mantra is "work smarter, not harder." But some skills cannot be obtained without actual practice. And without practice, one has to work harder to obtain a similar result as a skilled worker. To make things worse, the stimulus for Millennials, and Generation-I to try harder is muted by being told that “everyone is special”. In the infamous words of Dash Parr: “If everybody’s special, nobody is.” See also the Cerebral Dad’s point of view on this.

Do not expect Gen-Y to share your habit of getting up at 5:30, checking the market, wolfing down a processed breakfast and getting to work by 7:30. Just because Jack So-and-so or Rupert Whatisname did things that way means nothing to these kids. “Legendary CEOs” aren't Gen-Y role models; they want to be like the Google guys.

So what can we do with all this? How can we turn Generation Y’s work attitude into something beneficial?

To cater to their demand for meaningful work, tie anything that needs to be done to revenue. Coach them to set high personal development goals and track these at review time. Have prospective young employees define “meaningful”. If that does not align with what your company’s business is, they may not be a good hire. Alternatively, people being hired for non-revenue positions may very well be motivated by your company's participation in community outreach programs.

The pack mentality of Gen-Y is well suited for collaborative teams, and makes them uniquely prepared for Agile work environments. Rather than putting them in single cubicles with a window to an Internet full of distractions, put them in larger rooms with shared table-space. The peer pressure will keep momentum up and distractions down. The collaborative setup will bring out the best in them. The small chunks of work will fit their attention span and the rapid results will cater to their zest for instant gratification.

The concept of people being in an office 9-5 is bunk. Urban sprawl and associated traffic congestion, rising fuel prices and subsequent public transportation congestion conspire against this ancient rite and make telecommuting more attractive. Bosses who want people in an office at all times do not understand how to measure the value that their staff brings. Either the work gets done at a desired quality level or it doesn’t.

Generation-Y will be much more open to alternative ways of working and will profess to have a higher set of morals than our contemporaries do. This creates an opportunity for us to groom our future leaders, driven by their own ambitions. So if we let go of our idée fixe of how people are supposed to work and focus on achieving results, we will get full cooperation from our new recruits and just may learn a new trick or two.

Who knows, you and your executives may like this new-found efficiency and take up a new, Boomer-appropriate hobby. Harley, anyone?

Wednesday, October 1, 2008

Spillner - The Eternal Struggle Between Good Enough and Perfect

By Kent Spillner - 1 October 2008

Software development projects should not be managed with the assumption that the final product would be perfect if only there weren't budgets or deadlines. Although theoretically true -- given infinite time and effort, the perfect solution would eventually be developed (along with an infinite number of imperfect ones) -- this style of thinking only sets the team and business up for failure. Instead, project teams should focus on delivering a working product today, and a better one tomorrow.

Perfect Ain't Possible

I applaud the good intentions of teams that plan to build the perfect solution, but that's not a realistic way to work. No amount of planning can account for every contingency (well, short of infinite planning, but then there wouldn't be time to develop anything!) Ditto development. Endlessly cranking out code is not a recipe for perfection. Unfortunately, it's hard to argue against such behavior. That's partly because, again, the assumption is theoretically true, and partly because it's in our nature. When people can visualize a goal, they're able to figure out the necessary steps towards achieving it. That's a very powerful capability. It's also very misleading. It requires tremendous genius to correctly figure out every step required to build something of modest complexity. And it requires even greater will power to backtrack several steps when you start heading down the wrong path.

The rapid evolution of modern business makes it impossible to correctly figure out every step, often because the goal itself is constantly changing. And the dynamic nature of software makes it especially painful to head down a dead end because developers can push code quite a ways before eventually hitting the wall. Now throw in budgets and deadlines. Not only do teams have to do everything right, they also have to do it on time the first time. Trying to build perfect components that make up a perfect whole is an uphill struggle, even without the business constraints.

Lather, Rinse, Repeat

Instead of pretending to build the perfect system, software development teams should build a good one, then make it better. Not in the next release or the next update, but in the very next build. Continuous improvement, not ultimate perfection, should be the goal. The difference in approach can be staggering. Where teams that strive for perfection end up floundering, teams that focus on continuous improvement end up delivering. The uphill struggle becomes a leisurely stroll. This is not meant to imply that continuous integration is a convenient excuse for laziness or slack. On the contrary, continuous improvement is hard work that requires discipline and dedication, just like striving for perfection does. But whereas achieving perfection is impossible, stepwise monotonic improvements are manageable.


When I was an undergraduate, I worked as a programmer at a research lab on campus. It was a formative experience. Super smart people, great work environment, interesting problems. Then we lost our funding. That's when reality sunk in: we weren't as good as we thought. To get our ship back in order, we hired a full-time project manager. Under his guidance, we rewrote two whole research systems from scratch in about nine months. Our experience from the previous systems certainly helped our productivity -- we knew several mistakes not to make again -- but it was his approach to building software more than anything else that made us successful. I've experienced similar situations three more times since then (I'm sure I'll learn eventually). In each case, the deciding factor in the team's turn-around was a dedication to steady improvement. Instead of shooting for the moon, we worried about building better software.

Stop Hitting Yourself

That's what this column is all about. I've been a part of underperforming teams. I've pulled all-nighters to meet deadlines, fixed bugs right up to the start of demos, broken my share of builds, and had my projects cancelled. I've also been a part of alpha IT teams. I've helped put systems into production ahead of schedule and under budget, I've worked on projects that release new versions week after week without any serious bugs, and I've felt the excitement return to the office as teams respond to disaster by building overwhelming success. I want to share with you the lessons I've learned, and help you avoid the mistakes I've made. I want to offer advice borne from my own personal experiences, and help you build successful software.

It's All About the Build

I used to think that the best place to start changing software development teams was at the bottom. Get the developers to write better code and focus more closely on building what the customer wants. Now, I think it's best to go straight for the gold: the build. Builds are the only true measure of success on a software project; they're the only product that delivers value. Hence, my first column on the importance of frequent, regular releases into production.

Then what? What should you do after delivering new features and functionality in individual increments? Make the product better. You don't have to invest in making it radically better, just better. And make it better with every release. And what then? Your product is improving, let's improve the process, too! I want to help you iterate, and iterate well. I want to help you organize your work, and then I want to help you manage it better. I want to help you engage every member of your team, and I want to help you keep every member engaged. And all the while, I want you to continue pushing out new versions of your software, and I want you to make each better than the last.

About Kent Spillner: Kent loves writing code and delivering working software. He hates cubicles, meetings and Monday mornings.

Wednesday, September 24, 2008

Robarts/Rickmeier - How Deep Should Your Pockets Be?

By Jane Robarts and Mark Rickmeier - 24 September 2008

Teams and organizations usually choose to distribute an Agile project across multiple locations for one of two reasons: cost savings or location of resources. In the latter case, organizations often have no choice but to distribute a project due to the location of delivery team members or subject matter experts, or the availability of office space. Cost savings, however, are another matter entirely. Many organizations are choosing to distribute projects overseas to development shops in Asia. Wooed by the reduced costs overseas development can offer through lower salaries and expenses, organizations often leap into distributed development expecting a margin equal to the difference in salaries of the employees.

Distributed delivery, especially captive offshore or outsourced overseas delivery, has additional costs often not considered. Identifying and quantifying these costs before deciding to distribute can be the difference between a well-budgeted and planned project and one that continuously leaks money and time.

On the surface, it may seem simple to forecast the extra costs associated with distributed delivery: budget a few trips to transition knowledge at the start and end of the project, ensure the teams all have phone lines to communicate, and recognize that people may have to adjust their schedules a bit due to time zone challenges. However, as one digs deeper, it becomes clear that there additional prices to be paid to successfully distribute a project.

Take, for example, the costs associated with travel to and from the distributed sites. An organization has undertaken a distributed project and is sending a subject matter expert (SME) to the offshore location to transfer knowledge early in the project. Flights to and from the site have been priced and accommodations budgeted. On the surface, all seems planned out. It turns out this employee needs a visa to travel. But first he needs a passport. So the organization digs into its pockets, pays for the passport and visa, and absorbs the cost of the SME having to spend two days queueing at the passport office and consulate. Now, the SME is ready to travel. Almost. There are the immunizations to be taken care of. Digging further into the organization’s pockets, immunizations are paid for and a further half day is taken off to visit the doctor.

Costs continue to mount once the SME realizes that he’ll need a computer while at the offshore site. His company-issued desktop can’t travel with him. Yet again, the pockets open up and a laptop is procured. A day of the SME’s time is spent configuring this computer for his needs.

Finally, our SME starts his voyage offshore. It’s a 36-hour trip overseas to the offshore facility. He’s travelling during the week, and the organization reaches a little further into its pockets as the SME is unavailable during this time to answer questions and productivity of the development team slows down. Despite the SME arriving at the offshore site a bit jetlagged, the visit is a huge success. Knowledge is transferred to the offshore team, rapport is built, and only once does the organization have to dig into its pockets unexpectedly when the SME realizes his cell phone won’t work overseas.

These small costs, while individually insignificant, accumulate to become a big expenditure for the organization, not only in terms of cash but also in terms of loss of the SME’s valuable time and the compounding impact this has across the team.

Beyond travel, there are other costs of distributed delivery. All projects plan a certain amount of contingency. Distributed projects need to plan for more. Events at one site can have a ripple effect on the other. A snowstorm at one site may mean half the team is out, creating a bottleneck to the other team. A power outage on the offshore side may mean developers onshore can’t get the latest code. These two independent events can have double the impact, requiring the organization to reach into its pockets to pay for the loss of productivity.

Employee loyalty is expensive. Distributed delivery, especially where it involves extreme time zone changes, can take a toll on an employee. This ‘flat world’ of software delivery still has activity regulated by daylight and normal operating hours. Distributed delivery teams have to work outside of these normal hours to collaborate with their distanced team members. It’s not uncommon for distributed teams to be burdened with early morning and late evening phone calls, emails, or video conferences. People tire. Social lives suffer. And employee loyalty is challenged. Compensating for this with perks such as extra paid leave, in-office meals, and even late-night cab rides home all add to the cost of a distributed project.

Adding up all the factors for this distributed project, the organization’s pockets had better go deeper than simply the cost of resources and travel. The little items add up and often compound, and the embedded costs escalate. Planning for all of these costs prior to embarking on a distributed project will guarantee that the true cost advantages to distributing are realized.

All of this makes distributed Agile delivery sound like an expensive venture not worth pursuing. This isn't necessarily true: cost benefits can definitely be realized. There is a point where the investment becomes worthwhile. It's just important to understand and factor in all those hidden costs when budgeting for distributed delivery. This is particularly true with distributed Agile delivery which expects changing requirements, relies on a solid business understanding by the team, and expects designs to evolve.

At what point does the investment in distributed delivery make sense? There is no set rule, no formula that can be applied. Having the right economy of scale to justify the cost of distributed development is critical to reaping the benefits of lower salaries, 24-hour development cycles, and available resources. The simple rule is: don't be blinded by the lower burn rates offered by offshore development. Be prepared to do the homework and fully understand the hidden costs of distributed development:

  • infrastructure and hardware, and more complex (and time-consuming) procurement rules for each
  • travel costs, especially last-minute emergency trips
  • cross pollination: transferring team members from each location to work for long periods with teams in other locations de-risks a project, but isn't done in zero time or for zero cost
  • additional roles: duplication of roles in each location may be necessary to ensure the right individuals are available to answer questions, guide the team, etc.
  • latency in communication: at best, work is delayed because somebody isn't instantly available; at worst, work proceeds and leads to a false positive that work is complete and requires rework to correct what was done
  • additional contingency to reflect the exponential loss of capacity when key people are unavailable or where events in one location impact progress in another
  • long distance calls, communication devices, video recorders, digital cameras, international mobile phone roaming, etc.
  • personal strain of communicating (or not communiating) via lo-fi channels that leads to frustration, hostility and even withdrawal
  • changes in personal schedules to account for timezones
  • learning the rules and procedures - all countries have different visa requirements

Longer-running Agile projects composed of multiple releases typically see the cost savings to a greater extent than short-term ones. Many of the additional costs for distributed development are one-off or early-stage events in the life of a project. Hardware investments typically occur only once. If there is a stable distributed team, knowledge transfer visits and rotations may decrease over the lifetime of the project. As Agile practices mature for the distributed team and they find their pace and understand the business, communication needs between the locations may diminish. The longer an Agile team works together on a project, the better able it will absorb startup costs and more likely it is to become efficient in work patterns, yielding the expected benefits of distributing work.

Distributed Agile delivery in most cases is a worthwhile undertaking. The benefits may not be realized as early as you might anticipate, and the costs may be higher, so your pockets may have to be deeper than initially thought. However, careful and considered planning and budgeting early on will help predict if the project is worth the distributed investment.

About Jane Robarts: Jane has over 10 years of experience in custom development of enterprise applications as a developer, analyst, project manager, and coach. She has consulted on and managed projects in India, China, the UK, the US, Australia and Canada. Jane has spent most of the last several years coaching distributed teams and working directly on distributed Agile delivery projects both onshore and offshore. She has also developed and delivered many sessions on Agile delivery to a variety of audiences.

About Mark Rickmeier: Mark has 8 years' experience as a quality assurance tester, business analyst, iteration manager, Scrum master and project manager working on large scale enterprise applications. He has extensive distributed development experience working with project teams in Australia, India, China, Canada, the UK, and the US. His industry background includes equipment leasing, retail, insurance, healthcare and retail banking. He is currently a project manager specializing in Agile transformation and project governance with a focus on metrics and measurement.

Wednesday, September 17, 2008

Pettit - Is Your Project Team "Investment Grade?"

by Ross Pettit - 17 September 2008

One of the most important indicators of risk in debt markets is the grade (or collateralized debt obligations. Despite the controversy, the rating agencies remain the authority on assessing credit quality. Their impact on AIG's efforts to raise capital this week indicates how much market influence the rating agencies have.

There are several independent companies that assess the credit quality of bonds. The bond rating gives an indication of the probability of default. Although the bond is what is rated, the rating is really a forecast of the ability of the entity behind the bond – e.g., a corporation or sovereign nation – to meet its debt service obligation.

Each rating firm uses a different and proprietary approach to assess credit quality, involving both quantitative and qualitative factors. For example, bond ratings by Moody’s Investors Service reflect long-term risk consideration, predictability of cash flow, multiple negative scenarios, and interpretation of local accounting practices. In practical terms, this means that things such as macro and micro economic factors, competitive environment, management team, and financial statements are all factors in determining the credit worthiness of a firm.

Rating agencies are subsequently able to characterise the risk of debt investments. An investment grade bond will have lower yield but offer higher safety (that is, lower probability of default). A junk bond will have higher yield but lower safety. Beween these extremes are intermediate levels of quality: a bond that is rated AA will have very high credit quality, but lower safety than a AAA bond, while a bond rated at A or BBB, while still investment grade, indicates lower credit quality than a AA bond.

This concept is portable to IT. Just as the entity behind a bond is rated, a team behind an IT assets under development can be rated for its “delivery worthiness.” The difference is that we look to the rating not as an indicator of the risk premium we demand, but as a threat (and therefore a discount) to yield we should expect from the investment.

To rate an IT team, we can look at quantitative factors, such as the raw capacity of hours to complete an estimated workload, variance in the work estimates, and so forth. But we also need to look to qualitative factors. Consider the following:

  • Are we working on clear, actionable statements of business need? Are requirements independent statements of business functionality that can be acted upon, or are they descriptions of system behaviour laden with dependences and hand-offs?
  • Are we creating technical debt? Is code quality good, characterised by a high degree of technical hygiene (is code written in a manner that it can be tested?) and an absence of code toxicity (e.g., code duplication and cyclomatic complexity?)
  • Are we working transparently? Just as local accounting practices may need to be interpreted when rating debt, we must truely understand how project status is reported. Are we managing and measuring delivery of complete business functionality (marking projects to market), or are we measuring and reporting the completion of technical tasks (marking to model) with activities that complete business functionality such as integration and functional test deferred until late in the development cycle?
  • Are we delivering frequently, and consistently translating speculative value into real asset value? In the context of rating an IT team, delivered code can be thought of synonymously with cash flow. The more consistent the cash flow, the more likely a firm will be able to service its debt.
  • Are we resilient to staff turnover? Is there a high degree of turnover in the team? Is this a “destination project” for IT staff? Is there a significant amount of situational complexity that makes the project team vulnerable to staff changes?

At first glance, this may simply look like a risk inventory, but it’s more than that. It’s an assessment of the effectiveness of decisions made to match a team with a set of circumstances to produce an asset.

There are few, if any, absolute rules to achieving a high delivery rating. For example, assigning the top talent to the most important initiative may appear to be an obvious insurance policy for guaranteeing results. But what happens if that top talent is bored to tears because the project isn't a challenge? Such a project – no matter how much assurance is given to each person that they are performing a critical job – may very well increase flight risk. If that materialises, the expectation for returns of that project will crater instantly. If it's not expected, a team can appear to change from investment grade to junk very quickly.

While the rules aren’t absolute, the principles are. An IT team developing an asset expected to yield alpha returns will generally be characterised as a destination opportunity offering competitive compensation, operating transparently with actionable requirements, maintaining capability "liquidity" and a healthy “lifestyle,” and delivering functionally complete assets frequently to reduce operational exposure. All of these are characteristics that separate a team that is investment grade from one that is junk.

While these factors are portable across projects they may not be identically weighted for every team. This doesn’t undermine the value of the rating as much as it means we need to be acutely aware of the circumstances that any team faces. This also means that assessing the delivery worthiness of a team is borne of experience, and not a formulaic or deterministic exercise. While the polar opposites of investment-grade and junk may be clear, it takes a deft hand to recognise the subtle differences between a team that is worthy of a triple-A rating and one that is worthy of a single A, and even why that distinction matters. It also requires a high degree of situational awareness – employment market dynamics, direct inspection of artifacts (review of requirements, code, software), and certification of intermediate deliverables – so that the rating factors are less conjecture and more fact. Finally, it is an exercise to be repeated constantly, as the “market factors” in which a team operates – people, requirements, technology, suppliers and so forth – change constantly. This is consistent with how the rating agencies bring benefit to the market: they are not formulaic, they spend significant effort to interpret data, and they are updated with changing market conditions.

FDIC Chairman Sheila Bair commented recently that we have to look at the people behind the mortgages to really understand the risk of mortgage-backed securities. With IT projects, we have to look at the people and the situations behind the staffing spreadsheets and project plans. IT is a people business. We can measure effectiveness based on asset yield, but we are only going to be as effective as the capability we bring to bear on the unique situation – technological, geographical, economic, and even social-political – that we face. Rating is one means by which we can do that.

Investors in financial instruments have a consistent means by which to assess the degree of risk among different credit instruments. IT has no such mechanism to offer. Just as debt investors want to know the credit worthiness of a firm, so should IT investors know the delivery worthiness of their project teams.

Especially when alpha returns are on the line.

About Ross Pettit: Ross has over 15 years' experience as a developer, project manager, and program manager working on enterprise applications. A former COO, Managing Director, and CTO, he also brings extensive experience managing distributed development operations and global consulting companies. His industry background includes investment and retail banking, insurance, manufacturing, distribution, media, utilities, market research and government. He has most recently consulted to global financial services and media companies on transformation programs, with an emphasis on metrics and measurement. Ross is a frequent speaker and active blogger on topics of IT management, governance and innovation. He is also the editor of