Human Error or a Bigger Problem? When to Dig Deeper

by Julius DeSilva

In the world of process improvement and problem-solving, human “user” error can often become the go-to explanation when things go wrong. A mis-entered data point, a forgotten step in a procedure, or a misconfigured setting—blaming the user is quick and easy. But how do you know when an issue is bigger than just user error?

Understanding when to dig deeper and identify systemic flaws is critical. By integrating structured approaches like Root Cause Analysis (RCA) and the PDCA (Plan-Do-Check-Act) cycle, organizations can shift from a reactive blame culture to a proactive, continual improvement mindset that eliminates recurring problems at their source.

The Prevalence of User Error in Different Industries

Human error has been identified as a significant contributor to operational failures across multiple sectors:

  • Cybersecurity: According to the World Economic Forum, 95% of cybersecurity breaches result from human error.
  • Manufacturing: A study by Vanson Bourne found that 23% of unplanned downtime in manufacturing is due to human error, making it a key contributor to production inefficiencies. The American Society for Quality (ASQ) reports that 33% of quality-related problems in manufacturing are due to human error.
  • Healthcare: The British Medical Journal (BMJ) estimates that medical errors—many due to human factors—cause approximately 250,000 deaths per year in the U.S. alone.
  • Aviation & Transportation: The Federal Aviation Administration (FAA) attributes 70-80% of aircraft incidents to human error, but deeper analysis often reveals process design issues, poor training, or missing safeguards.

These statistics reinforce a key point: Human error isn’t always the root cause—it’s often a symptom of a deeper, systemic issue.

Recognizing When to Look Beyond User Error

Here’s how to tell when an issue isn’t just a one-time mistake but a signal that the system itself needs improvement:

  1. Recurring Issues Across Multiple Users – If multiple employees are making the same mistake, the problem likely isn’t individual human error—it’s a flaw in the process, system design, or training. For example, if multiple operators incorrectly configure a machine setting, it might indicate confusing controls, inadequate training, or unclear documentation rather than simple user mistakes.
  2. Workarounds and Process Deviations – If employees consistently find alternative ways to complete a task, the system may not be designed for real-world conditions. If workers routinely bypass a safety feature because it “slows them down,” the process needs reevaluation; either through retraining, redesign, or better automation. At QMII, we always reinforce building a system for the users, built on the as-is of how work is done and then making incremental improvements.
  3. High Error Rates Despite Training – If errors persist even after proper training, the issue might be process complexity, unclear instructions, or a lack of intuitive system design. If employees consistently make minor mistakes, the system interface or workflow rules might need simplification rather than just retraining staff.
  4. Error Spikes in High-Stress Situations – Mistakes often increase under time pressure, fatigue, or stress. This suggests a workload or process issue rather than simple carelessness. In a maritime environment, high error rates during critical operations could signal staffing shortages, inefficient safety interlocks, or poor user interfaces on devices.

Instead of just fixing errors after they happen, organizations should use the PDCA (Plan-Do-Check-Act) cycle to continually improve processes and reduce the probability of recurring failures.

The PLAN-DO-CHECK-ACT Approach

PLAN – Identify the context and potential risks

  1. Identify the context of the process including the competence of personnel, user environment, complexity and influencing factors.
  2. Apply Failure Mode and Effects Analysis (FMEA) to predict where failures are likely to happen before they occur.
  3. Identify and involve representatives of users through the development of FMEAs and the process.
  4. When predicting controls and resources, determine the feasibility of implementing and providing them.
  5. Simplify procedures, redesign workflows, or introduce automation to eliminate failure points.

DO – Implement the Process and Improvements

  1. Implement the process and test it to check its effectiveness. In the initial stages more frequent monitoring and measurement will be required. The periodicity between checks can be reduced as the process matures.
  2. Provide user training and assess its effectiveness. When errors occur retrain personnel, but only if training is truly the issue—don’t use training as a Band-Aid for bad system design.
  3. Look beyond documented “standard-operating” procedures. As an example: The company implements a visual step-by-step guide near machines to ensure operators follow a standard calibration process.

CHECK – Evaluate the Results

  1. Track performance data to see if the changes have reduced errors.
  2. Get user feedback to ensure the new system is intuitive and efficient. For example, Error rates drop by 40%, but operators still struggle with a specific step—prompting another refinement.

ACT – Standardize & Scale

  1. If the improvement is successful, integrate it as the new standard process.
  2. Scale the change across other departments or sites where similar issues might exist. For example, the company implements the same calibration guide and training approach across all locations, preventing similar errors company-wide.

Conclusion: From Blame to Solutions

While human error is a reality, it’s often a symptom of a deeper process flaw, not the root cause. Those involved in conducting a root cause analysis process or investigation process, must ask “How did the system fail the individual” and “Why did the system fail the individual”. By shifting from a blame mindset to a continual improvement approach, organizations can:

  • Reduce costly errors and downtime
  • Improve employee engagement (less frustration = higher productivity)
  • Enhance conformity and compliance
  • Increase process reliability and efficiency

Monitoring the system will continue for as the context changes the controls implemented may not be as effective as before. A proactive system will not guarantee that things never go wrong. When they do, however, the key is to dig deeper. Using tools like PDCA, FMEA, and RCA will help in identifying long-term solutions to recurring problems. Because in most cases, fixing the system is better than blaming the human.

AIAG-VDA FMEA vs Traditional FMEA – The Differences

FMEA or Failure Mode Effects Analysis has been in use since the 1940s. It was primarily used in the aerospace industry to start with and then slowly made its way into the automotive sector where it gained popularity. In 2019 a change was made to the FMEA methodology used and AIAG (The US Automotive Group) and VDA (the German counterpart) issued a new FMEA handbook that changed the methodology of how this process was carried out. For companies this does not mean that an immediate changeover is required. The need for use of the new methodology will be driven by the customer as part of their requirements.
What is FMEA? FMEA is a tool used to assess risk. There are two types of FMEA. Process FMEA and design FMEA. Using the tool organizations can identify potential threats within their process and design and take actions to address them before they develop into a non-conformity. In essence therefore it is a preventive tool. While there are differences between the traditional and new methodologies, they both use the same process to identify and mitigate risks.
They both still requires three axes for calculation of the risk to the organization. The first is the probability or likelihood of detection, the next is the severity or consequences and the last factor taken into consideration is the ease of detection before the error or risks occurs. If less likely to detect the risk is greater and is easy to detect then the risk overall is considered to be less. FMEAs must be done by teams and the overall risk is based on a criteria set by the organization and not by one individual. Therefore, it is also always better to use teams to conduct an FMEA opposed to one individual doing it.
FMEA’s are not static documents that once created do not require a change. They are living documents that are updated and reviewed at periodic intervals to ensure no changes that may change the overall risk. In the traditional FMEA an RPN or Risk Priority Number was calculated. A number or people over the years have critiqued the RPN approach as the threshold at which a risk is considered not acceptable is often arbitrary. In the AIAG-VDA approach they have changed this to an Action Number and the handbook provides a table for guidance with what each Action Number means. The new methodology is also broken down into seven steps.
To learn more about FMEA and how to conduct either a Design FMEA or a process FMEA join QMII’s training offered in both an onsite and virtual instructor led format.

AS9100-Risk-Based Thinking in the Airline industry – It’s about time.

The airline industry statistically has one of the best safest records. AS9100 defines the framework for a quality management system for aerospace parts manufacturers across the globe. Over the past decade there have been several airline accidents however, that have brought the safety of airlines to the forefront. In a most recent case of the Boeing 737-max a software glitch was identified as the cause. As investigations proceed the general consensus is that this glitch should have been previously identified.

Risk generally is associated with ‘uncertainty’ or ‘negativity’. This changed with ISO 9001:2015 and the onset of risk-based thinking that now asks companies to consider the opportunities for improvement that may arise out of taking a ‘calculated’ risk. Further in AS9100, that is built on ISO 9001, there are requirements for consideration of strategic risks and operational risks and the need to take action to address each. The impact of coronavirus or a similar pandemic is a great example of a strategic risk that can affect business continuity.

Risk-based thinking in the AS9100 standard promotes customer focus within an organization. While risk-based thinking has been inherent in previous versions of the standard with preventive action, the new standards address risk at each stage of the PDCA cycle thus enabling the entire As9100 management system at each stage as a preventive tool.

The aerospace and automotive industry are leaders in the implementation of Failure mode and effects Analysis (FMEA) and the Plan-Do-Check-Act Cycle (PDCA) of process management.  Originally adopted by the military in the 1950’s, FMEA later was embraced by the auto and aerospace industries.  The FMEA process identifies risks that can then be addressed using mistake proofing and problem solving with a team approach.  FMEA can be used for either product or process. When used properly it can be a very effective at addressing risks. FMEA is a great core tool that can be applied to address the AS9100 clause 8.1.1 operational risk requirements.

AS9100 asks top management to take accountability for the quality of products and services produced by their organization; keeping a customer focus at the core of all they do.  The influence of end users, customers and the companies marketing department on the product’s design needs to be constantly reviewed. At each stage of the requirements gathering, design & development and manufacturing stages of the AS9100 system there are potential risks. As such doing a single FMEA may not be sufficient but may require a review of the FMEA at periodic intervals as a change in inputs to the process/product may change the associated risks or identify new ones.

Management wants to encourage continuous improvement and innovative recommendations by all stakeholders, but changes must be reviewed.  Whenever a change is made to a AS9100 certified product or service, that change should follow the PDCA Cycle approach, the same way it was done when the product was first introduced.  This will reduce the number of recalls, and the risk of injuries to end users of the products.

A single non-conforming product that goes out of the organization into the market results in an intangible loss for no value can be put on the loss of reputation. It only takes a single incident! Starting with risk appreciation at the Plan stage of the PDCA cycle and then throughout the rest of the cycle, with a focus on customer satisfaction, will help the aerospace industry improve by preventing non-conformities before they occur as well as hopefully, improve their As9100 certified products.

What is SMEA and FMEA?

Success Modes and Effects Analysis

An organization is likely to succeed if it understands the system that runs its business. It can then identify where it needs to make improvements and use its system to succeed. QMII help clients to develop their process-based management systems by using success modes and effects analysis (SMEA). SMEA conversely to FMEA focuses on the success areas (opportunities) the organization is trying to achieve and determining what are the potential risks to achieving them. They then taken action to address these risks. While all risks cannot be eliminated based on resource constraints, SMEA provides an opportunity for organization to prioritize the risks and take appropriate action.

To implement SMEA, top management need to analyze and document what their organization does to convert customer needs into cash (success modes). This enables them to see where waste can be eliminated by applying lean principles to achieve lean design, lean manufacturing, lean administration and lean service.This determines the key processes in the system that runs the business. The next step involves working with the process owners to analyze each of the key processes for the fulfillment of process objectives (effects analysis). This results in a flowcharted procedure for each key process.  If you’re not fond of flowcharts then any other method of documentation will do. These procedures refer to the interacting processes and supporting documents.

Competent employees, from the recruiting and training processes, are coached by their leaders to use their system to eliminate causes of waste and succeed. These systems include procedures for creating new products and new processes with inputs from successful designs (see FMEA below).

Organizations can use SMEA to build and grow the success of their organizations.

Failure Modes and Effects Analysis

FMEAs during product and process design prevent failures of products and processes. A team, representing customers, designers, manufacturers, installers, users and suppliers agrees upon the rules for evaluating risk. The team works through each of the ways in which the process or product could fail (potential failure modes) and assign a score per the rules to signify the frequency and impact of each type of failure (effects analysis).

Failure modes that potentially are the most frequent or could have the biggest impact (or both!) are the highest priority. Teams remove the root causes of such failure modes to prevent their occurrence. These preventive actions make processes and products much more reliable from the beginning.

As you might expect the entire automotive industry now uses FMEA to improve reliability. Yes, not one car maker considered the sudden loss of global financing; a rare failure mode with dire consequences! Organizations that fail to use FMEA have to suffer the many losses due to incapable processes and poor products. Repeated failure may enable them to learn the hard way if they remain in business.

FMEA works best as a preventive action tool within a process-based management system (see above).

QMII facilitates failure modes and effects analysis (FMEA) and success modes and effects analysis (SMEA) for our clients.