In this video, we’re going to discuss the concept of ITIL Hacks for Problem Management in IT Service and Support. We’ll use a case study involving the ITIL practice of Problem Management.
As many of you know, ITIL is a multi-billion dollar industry consisting of training, certification, assessments, and other activities that are designed to improve the effectiveness of IT service and support.
ITIL 4 was rolled out in February of 2020, and with 34 practices it was even more complex than ITIL 3, which was comprised of 26 processes. But even before ITIL 4 was rolled out, we noticed that many in the service and support industry were taking a different approach when it came to ITIL.
And instead of doggedly pursuing training, certification, and maturation, and then waiting years to see…any kind of benefit from all that effort, they decided instead to hack ITIL. An ITIL hack is simply a shortcut that allows you to achieve the benefits of a mature ITIL practice without having to go through months or even years of effort.
This is best illustrated by example, and I will use a case study involving the ITIL practice of Problem Management.
In a benchmark that MetricNet performed for one of the largest insurance companies in the world, it was discovered that end users in this company were logging an average of 2 service desk incidents per month. That’s about double the industry average, which was concerning not only because the average cost of a ticket at level 1 was about $25, but also because every ticket represents productive time that’s lost by the user. And that’s usually the bigger cost – that is, the lost user productivity – rather than the direct cost of support.
As a result of this benchmarking revelation – two tickets per user per month – the CIO of this insurance company appointed a team of individuals who worked in IT support, and gave them a mandate: That mandate was to reduce the ticket volume by 50% in one year.
It was a big goal! It was an audacious goal! But it was one that was absolutely achievable, as you will soon see.
How did they do it? Well, they started with the problem management database. It had more than 6,000 known problems in it. That’s not unusual, by the way, for a large enterprise. But when they looked at the number of incidents associated with each problem, they quickly realized that less than 200 problems accounted for more than 90 percent of all incidents.
So, the obvious thing to do is focus on the problems with the highest number of incidents. That’s where you get the biggest bang for the buck in problem management. But they added one other element to the equation, and they called it problem velocity. Which is simply the number of incidents associated with a given problem…. divided by the number of months the problem has been active. So, a 10 month old problem with 1,000 associated incidents has a velocity of 100 incidents per month. By contrast, a 2-month-old problem with 1,000 associated incidents has a velocity of 500 incidents per month.
They focused on the roughly 200 problems with the highest velocity and dissected them one by one. They looked at Splunk records, they looked at ticket logs, they looked at assignment groups or resolver groups, and a variety of other factors. And once they understood the nature of each problem, they identified what I would call a logical problem owner. The logical owner was most often an IT manager or executive who had ownership over a particular application or a piece of the enterprise infrastructure. For the 200 problems they were attacking, there were about 70 problem owners since some owners had more than one problem assigned to them.
Once they had identified the problem owners, they began producing a weekly report that ranked the top 200 problems from highest velocity to lowest velocity. So, the problems that were generating incidents at the fastest rate ranked highest, while those in the top 200 that generated the fewest incidents each month ranked near the bottom.
And because the problem management team had the backing of the CIO, they got very little pushback from the logical problem owners. Moreover, the problem owners were given a mandate. You have 90 days to reduce your problem velocity by 50% – that is, reduce the number of incidents produced by a problem by half – and you have 180 days to reduce your problem velocity by 75%.
Now, in some cases, problem management can reduce the incidents associated with a problem entirely. But more often than not, because problems might have many root causes, you are merely slowing the rate at which incidents are generated by a given problem.
This team reached their goal of 50% reduction in total incidents – not in a year – but in less than six months.
So, what lessons can we take away from this? Well, there’s a few.
- First, they had CIO sponsorship. That was invaluable, particularly when assigning problem owners
- Secondly, there was no ITIL training and certification. They bypassed that altogether. That’s why I call this an ITIL hack. They got all the benefits of ITIL, without going through the pain of maturing the formal ITIL practice of problem management.
- Thirdly, they started with a measurable goal – a 50% reduction in incident volume – and they worked backwards from there.
- And finally, they used design thinking to achieve their end goal. They invented a metric called Problem velocity; they discovered that just 200 problems drove the vast majority of incidents; they used Splunk records; and, they assigned problem owners. These are all examples of Design Thinking, that led to the success of this Problem Management ITIL hack.
Thanks for joining me today. I hope you found my vlog on ITIL hacks for problem management to be informative and insightful.