Saturday, January 23, 2010

ITIL is considered essential, but does it internalize transaction cost?

By John Kehoe - 19 February 2010

I have slept though many an ITIL briefing. There is something about those process charts that just cause me to glaze over; I must have been thinking about the cost of applying ITIL. It is a business process and process must be paid for. Does it earn its keep, or does it waste money on needless feedback loops for monitoring the quality of printing services. Certainly, it should only be applied in a supporting role for revenue facing activities. If so, why is there no mention of application management and revenue alignment? Transaction Performance Management (TPM) fits that niche nicely.

My first encounter with ITIL occurred some years back at a companywide sales kickoff for a $5B software company. We didn’t sell ITIL software, but we wanted all the consultants, system engineers and account executives fluent in ITIL and position our products in that context. We had a very nice, engaging and intelligent fellow for an instructor. He had impeccable credentials and came from a name brand global consultancy.

He introduced the history of ITIL, which was my first warning of impending doom (and eye glazing). ITIL was created by UK bureaucrats over a decade ago. It details exacting process and frameworks for IT organizations. The idea being that process can bring order to chaos. Its process is well suited for change management and run books, but there are parts that are just plain silly. Do we need a TQM style approach with elaborate feedback loops for printing, file, network services and availability? Most IT activities are utilities. There are two variables of concern. One, is the service available? Two, is it performing? There is no need for detailed statistics, they don’t matter.

Another concern about ITIL is cost. ITIL is a process and process costs money. I’ve not seen ITIL performance statistics tied to revenue in any shop I’ve visited. There are claims of increased output, reports showing three digit improvements in productivity, downtime reports to two decimal places. We’ll that is great news. How much did it cost? How much revenue did it contribute? Did we apply resources efficiently? ITIL doesn’t answer those questions.

What is missing from ITIL (other than affordable documentation)? Performance management is the key piece. There is no mention of Transaction Performance Management. By using a TPM approach one can quantify end user experience, determine root cause degradation and use an ingrained feedback loop. Why do this, to focus effort real business impact.

Let’s lay out a scenario. End users are complaining about order entry. We hit our run books and start the triage. We hit the server and application stats. We review the log files. We have each team investigate their respective technology with their tools. Maybe there are some sophisticated workflows to follow. Noticed what happened. Where is the end user? Where is the business alignment? A formal framework around dodgy process is a dodgy formal framework. Worse still, it does nothing to resolve the performance issue.

Now let’s keep the eye on the transaction. We see what, who and where it is coming from. Maybe it’s a slow network link. Then it’s a last mile problem and we needn’t bother the developers or DBA’s. Maybe there something unique about the transactions, we see that and notify the middleware team. You see where I’m going with this. You see all the waits for transactions down to the storage array. You see the characteristics of the transactions and where and how they break. This is Transaction Performance Management. One keeps an eye on all of the transactions over the whole of the architecture in context of the business, user and technologies. This is what ITIL lacks, a tight, comprehensive approach to the performance of the business and underlying technologies.

Recently I worked with a very clever group of consultants. Their specialty is mapping business transactions to risk, and then mapping that risk to revenue. The old style of doing that is fixing a cost to an outage. For example, I might lose $60 for every minute I’m out of service. For two 9’s (88 hours), I’ll lose $315k. For four 9’s (52 minutes), I lose $3,100. Is it worth investing $300k to reduce this number? Ah, that is the question isn’t it. We don’t have enough information to justify this. Where does the $60 come from? Is it a semi- factually derived value (SWAG)? What transactions are in play? What happens if the transactions are slow as opposed to unavailable? What opportunity costs are incurred?

Are you investing in the right application? How do you know?

To answer those questions, my consulting friends built a sophisticated model based on how business transactions are executed and the revenue associated with each. They then added their secret risk models and viola; my customer can map their transaction data to her transaction risk model. She can see exactly where transactions are breaking down, have sliding scales of performance impact to risk and understand the opportunity cost for a given outage or SLO (Service Level Objective) breach.

This is a compelling approach. You can now make focused expenditure judgments based on transactional risk. You can immediately see the impact of a transaction change to the business. You can’t have a tighter and more focused feedback loop than that.

This is why the TPM approach makes so much sense. It ties business transactions to risk in a way that ITIL or ITSM or Six Sigma cannot do. In doing so it complements the structured approach, except TPM is the process that that pays for itself and builds business value.



About John Kehoe: John is a performance technologist plying his dark craft since the early nineties. John has a penchant for parenthetical editorializing, puns and mixed metaphors (sorry). You can reach John at exoticproblems@gmail.com.