Thursday, August 16, 2007

MTTR out: MTRS in

The Mean Time to Repair (MTTR) metric was widely known - but it has been replaced by a more meaningful and 'holistic' measurement.

The Mean Time to Restore Service (MTRS) is considered to be a better measurement for issues relating to availability and change management as it encompasses all aspects of service restoration and not just one element.

The problem with MTTR was that while a component (or part of a service) may have been repaired the service itself was still not available to an end user. Take a simple example. Server hard drive crashes, service is unavailable, emergency change is rushed through and the hard drive is replaced.

In this example the MTTR is measured from the time of the actual crash until the new hard drive is snapped into place. A good metric that can reflect on the overall performance of the IT department, but not a metric that is of any interest to the customers and users of that service.

Once the hard drive is snapped into place, the server is powered on and 15 or 20 minutes later the customers and end users can get access to the service. It is these 15 or 20 minutes that equate to MTRS and make it a better overall metric.

MTRS will be taken from the point of failure to the point where the service can be accessed by customers and end users.

Mean Time Between Failure (MTBF) remains as a way to measure the uptime of a component in the service chain, but now the MTRS is the true downtime of a service.

Labels: , , , , ,