Root cause analysis (RCA) is a class of problem solving methods aimed at identifying the root causes of problems or events. The practice of RCA is predicated on the belief that problems are best solved by attempting to address, correct or eliminate root causes, as opposed to merely addressing the immediately obvious symptoms. By directing corrective measures at root causes, it is more probable that problem recurrence will be prevented. However, it is recognized that complete prevention of recurrence by one corrective action is not always possible. Conversely, there may be several effective measures (methods) that address the root cause of a problem. Thus, RCA is often considered to be an iterative process, and is frequently viewed as a tool of continuous improvement.

RCA is typically used as a reactive method of identifying event(s) causes, revealing problems and solving them. Analysis is done after an event has occurred. Insights in RCA may make it useful as a pro-active method. In that event, RCA can be used to forecast or predict probable events even before they occur. While one follows the other, RCA is a completely separate process to Incident Management.

Root cause analysis is not a single, sharply defined methodology; there are many different tools, processes, and philosophies for performing RCA analysis. However, several very-broadly defined approaches or “schools” can be identified by their basic approach or field of origin: safety-based, production-based, process-based, failure-based, and systems-based.

  • Safety-based RCA descends from the fields of accident analysis and occupational safety and health.
  • Production-based RCA has its origins in the field of quality control for industrial manufacturing.
  • Process-based RCA is basically a follow-on to production-based RCA, but with a scope that has been expanded to include business processes.
  • Failure-based RCA is rooted in the practice of failure analysis as employed in engineering and maintenance.
  • Systems-based RCA has emerged as an amalgamation of the preceding schools, along with ideas taken from fields such as change management, risk management, and systems analysis.

Despite the different approaches among the various schools of root cause analysis, there are some common principles. It is also possible to define several general processes for performing RCA.

 

General principles of root cause analysis

  • The primary aim of RCA is to identify the root cause(s) of a problem in order to create effective corrective actions that will prevent that problem from ever recurring, otherwise addressing the problem with virtual certainty of success. (“Success” is defined as the near-certain prevention of recurrence.)
  • To be effective, RCA must be performed systematically, usually as part of an investigation, with conclusions and root causes identified backed up by documented evidence. Usually a team effort is required.
  • There may be more than one root cause for an event or a problem, the difficult part is demonstrating the persistence and sustaining the effort required to develop them.
  • The purpose of identifying all solutions to a problem is to prevent recurrence at lowest cost in the simplest way. If there are alternatives that are equally effective, then the simplest or lowest cost approach is preferred.
  • Root causes identified depend on the way in which the problem or event is defined. Effective problem statements and event descriptions (as failures, for example) are helpful, or even required.
  • To be effective the analysis should establish a sequence of events or timeline to understand the relationships between contributory (causal) factors, root cause(s) and the defined problem or event to prevent in the future.
  • Root cause analysis can help to transform a reactive culture (that reacts to problems) into a forward-looking culture that solves problems before they occur or escalate. More importantly, it reduces the frequency of problems occurring over time within the environment where the RCA process is used.
  • RCA is a threat to many cultures and environments. Threats to cultures often meet with resistance. There may be other forms of management support required to achieve RCA effectiveness and success. For example, a “non-punitory” policy towards problem identifiers may be required.

 

General process for performing and documenting an RCA-based Corrective Action

Notice that RCA (in steps 3, 4 and 5) forms the most critical part of successful corrective action, because it directs the corrective action at the true root cause of the problem. The root cause is secondary to the goal of prevention, but without knowing the root cause, we cannot determine what an effective corrective action for the defined problem will be.

  1. Define the problem or describe the event factually
  2. Gather data and evidence, classifying that along a timeline of events to the final failure or crisis.
  3. Ask “why” and identify the causes associated with each step in the sequence towards the defined problem or event.
  4. Classify causes into causal factors that relate to an event in the sequence, and root causes, that if applied can be agreed to have interrupted that step of the sequence chain.
  5. If there are multiple root causes, which is often the case, reveal those clearly for later optimum selection.
  6. Identify corrective action(s) that will with certainty prevent recurrence of the problem or event.
  7. Identify solutions that effective, prevent recurrence with reasonable certainty with consensus agreement of the group, are within your control, meet your goals and objectives and do not cause introduce other new, unforeseen problems.
  8. Implement the recommended root cause correction(s).
  9. Ensure effectiveness by observing the implemented recommendation solutions.
  10. Other methodologies for problem solving and problem avoidance may be useful.

 

Root cause analysis techniques

  • Barrier analysis – a technique often used in process industries. It is based on tracing energy flows, with a focus on barriers to those flows, to identify how and why the barriers did not prevent the energy flows from causing harm.
  • Bayesian inference
  • Change analysis – an investigation technique often used for problems or accidents. It is based on comparing a situation that does not exhibit the problem to one that does, in order to identify the changes or differences that might explain why the problem occurred.
  • Current Reality Tree – A method developed by Eliahu M. Goldratt in his theory of constraints that guides an investigator to identify and relate all root causes using a cause-effect tree whose elements are bound by rules of logic (Categories of Legitimate Reservation). The CRT begins with a brief list of the undesirables things we see around us, and then guides us towards one or more root causes. This method is particularly powerful when the system is complex, there is no obvious link between the observed undesirable things, and a deep understanding of the root cause(s) is desired.
  • Failure mode and effects analysis
  • Fault tree analysis
  • 5 Whys ask why why why why over until exhausted
  • Ishikawa diagram, also known as the fishbone diagram or cause-and-effect diagram. The Ishikawa diagram is for project managers for conducting RCA.
  • Pareto analysis “80/20 rule”
  • RPR Problem Diagnosis – An ITIL-aligned method for diagnosing IT problems.
  • Kepner-Tregoe Approach
  • Common cause analysis (CCA) common modes analysis (CMA) are evolving engineering techniques for complex technical systems to determine if common root causes in hardware, software or highly integrated systems interaction may contribute to human error or improper operation of a system. Systems are analyzed for root causes and causal factors to determine probability of failure modes, fault modes, or common mode software faults due to escaped requirements. Also ensuring complete testing and verification are methods used for ensuring complex systems are designed with no common causes that cause severe hazards. Common cause analysis are sometimes required as part of the safety engineering tasks for theme parks, commercial/military aircraft, spacecraft, complex control systems, large electrical utility grids, nuclear power plants, automated industrial controls, medical devices or other safety safety-critical systems with complex functionality.

 

Basic elements of root cause using Management Oversight Risk Tree (MORT) Approach Classification

Materials

  • Defective raw material
  • Wrong type for job
  • Lack of raw material

Man Power

  • Inadequate capability
  • Lack of Knowledge
  • Lack of skill
  • Stress
  • Improper motivation

Machine / Equipment

  • Incorrect tool selection
  • Poor maintenance or design
  • Poor equipment or tool placement
  • Defective equipment or tool

Environment

  • Orderly workplace
  • Job design or layout of work
  • Surfaces poorly maintained
  • Physical demands of the task
  • Forces of nature

Management

  • No or poor management involvement
  • Inattention to task
  • Task hazards not guarded properly
  • Other (horseplay, inattention….)
  • Stress demands
  • Lack of Process
  • Lack of Communication

Methods

  • No or poor procedures
  • Practices are not the same as written procedures
  • Poor communication

Management system

  • Training or education lacking
  • Poor employee involvement
  • Poor recognition of hazard
  • Previously identified hazards were not eliminated