Treating Downtime Problems

Treating Downtime.png

Communication Plan
Typically, when an outage occurs, the staff become very introverted. Having a communication plan and a communication manager are key. Information about progress should be shared with the users at least once per hour. The message should be short & concise containing the following:

1. Time downtime began
Short description of the known cause of the downtime
3. Impact of the downtime
4. Estimated time for restoration
5. Next scheduled information

Tried and True Rollback Process
Since a lot of outages are caused by bad changes, they should not be implemented without a rollback process. This can be unique for every technology. Document how to back out change for each one.

Post Downtime Plan
Once the systems have been restored, the best practice is to create a report with the following information:

1. Short description of the incident
2. Downtime duration
3. SLA impact
4. Short incident history
5. How we resolved the incident
6. What is the root cause
7. Necessary steps to prevent this kind of downtime from happening again.

Knowledge Based Repository
This information is then stored in a knowledge based repository for easy reference should this event occur again.

4 Steps to Stability
An environment which suffers consistent instability requires drastic measures. Here is a 4-step approach for addressing consistent technology instability in an organization:

1. Freeze changes. Absolutely no unauthorized changes can be made by anyone until further notice.
2. Implement a CAB (Change Advisory Board) & a management sanctioned change management process.
3. Identify the fragile systems causing the outages
4. Put a plan in place to stabilize those systems 


What is Downtime?

Downtime is another way of saying a system is not available to the users. It is also referred to as an outage. While downtime can be planned months in advance, it is typically not and is often a surprise.


5 Reasons Multitasking is Bad for IT Productivity

When it comes to IT operations, multitasking seems to be a prerequisite. Quite often it's even written into the job posting. However, research is revealing that multitasking may do more damage than good.

It's the Process, Stupid!

The greatest invention in the last 200 years isn't a product, but rather the scientific method, the process which has been used to create millions of products. Today, when change is exponential, a focus on process over products is even more important.



Downtime is a term used to describe when a service is unavailable to its intended recipients. While downtime can be planned months in advance, it is typically not and is often a surprise.

Most downtime events are unplanned and caused by a failure or are triggered on short notice and occur as a result of an attempt to fix a service that is not performing at its optimal level.

Signs & Symptoms
Unplanned downtime is the number one cause of financial harm yet most IT leaders don't understand the signs and symptoms of an environment that experiences too much unplanned downtime.

Sure it's easy to surmise that the systems are offline more than they should be especially when management is enraged but there are legitimate signs and symptoms which will allow you to reduce the frequency and impact of unplanned outages.

  • Unauthorized Changes
  • High amounts of Unplanned Work
  • Low Throughput of Effective Change
  • Server to Administrator Rations < 100:1
  • Lack of Indicator Measurements
  • SLA Commitment Breaches
Related Conditions

Low Throughput of Effective Change

More Videos You May Like
Downtime Video Image.png

Calculate Your Downtime Today!

Top 3 Ways To Prevent Downtime

1. Implement Preventative Maintenance Schedules

2. Execute Pre Business System Checks

3. Implement Measurements & Indicators

Implement Preventive Maintenance Schedules

Take care of your IT assets and they will take care of you. Implement a consistent, high quality preventative maintenance schedule. Let Allari do the chore based tasks while you focus on the important stuff!