Human Error or a Bigger Problem? When to Dig Deeper

by Julius DeSilva

In the world of process improvement and problem-solving, human “user” error can often become the go-to explanation when things go wrong. A mis-entered data point, a forgotten step in a procedure, or a misconfigured setting—blaming the user is quick and easy. But how do you know when an issue is bigger than just user error?

Understanding when to dig deeper and identify systemic flaws is critical. By integrating structured approaches like Root Cause Analysis (RCA) and the PDCA (Plan-Do-Check-Act) cycle, organizations can shift from a reactive blame culture to a proactive, continual improvement mindset that eliminates recurring problems at their source.

The Prevalence of User Error in Different Industries

Human error has been identified as a significant contributor to operational failures across multiple sectors:

  • Cybersecurity: According to the World Economic Forum, 95% of cybersecurity breaches result from human error.
  • Manufacturing: A study by Vanson Bourne found that 23% of unplanned downtime in manufacturing is due to human error, making it a key contributor to production inefficiencies. The American Society for Quality (ASQ) reports that 33% of quality-related problems in manufacturing are due to human error.
  • Healthcare: The British Medical Journal (BMJ) estimates that medical errors—many due to human factors—cause approximately 250,000 deaths per year in the U.S. alone.
  • Aviation & Transportation: The Federal Aviation Administration (FAA) attributes 70-80% of aircraft incidents to human error, but deeper analysis often reveals process design issues, poor training, or missing safeguards.

These statistics reinforce a key point: Human error isn’t always the root cause—it’s often a symptom of a deeper, systemic issue.

Recognizing When to Look Beyond User Error

Here’s how to tell when an issue isn’t just a one-time mistake but a signal that the system itself needs improvement:

  1. Recurring Issues Across Multiple Users – If multiple employees are making the same mistake, the problem likely isn’t individual human error—it’s a flaw in the process, system design, or training. For example, if multiple operators incorrectly configure a machine setting, it might indicate confusing controls, inadequate training, or unclear documentation rather than simple user mistakes.
  2. Workarounds and Process Deviations – If employees consistently find alternative ways to complete a task, the system may not be designed for real-world conditions. If workers routinely bypass a safety feature because it “slows them down,” the process needs reevaluation; either through retraining, redesign, or better automation. At QMII, we always reinforce building a system for the users, built on the as-is of how work is done and then making incremental improvements.
  3. High Error Rates Despite Training – If errors persist even after proper training, the issue might be process complexity, unclear instructions, or a lack of intuitive system design. If employees consistently make minor mistakes, the system interface or workflow rules might need simplification rather than just retraining staff.
  4. Error Spikes in High-Stress Situations – Mistakes often increase under time pressure, fatigue, or stress. This suggests a workload or process issue rather than simple carelessness. In a maritime environment, high error rates during critical operations could signal staffing shortages, inefficient safety interlocks, or poor user interfaces on devices.

Instead of just fixing errors after they happen, organizations should use the PDCA (Plan-Do-Check-Act) cycle to continually improve processes and reduce the probability of recurring failures.

The PLAN-DO-CHECK-ACT Approach

PLAN – Identify the context and potential risks

  1. Identify the context of the process including the competence of personnel, user environment, complexity and influencing factors.
  2. Apply Failure Mode and Effects Analysis (FMEA) to predict where failures are likely to happen before they occur.
  3. Identify and involve representatives of users through the development of FMEAs and the process.
  4. When predicting controls and resources, determine the feasibility of implementing and providing them.
  5. Simplify procedures, redesign workflows, or introduce automation to eliminate failure points.

DO – Implement the Process and Improvements

  1. Implement the process and test it to check its effectiveness. In the initial stages more frequent monitoring and measurement will be required. The periodicity between checks can be reduced as the process matures.
  2. Provide user training and assess its effectiveness. When errors occur retrain personnel, but only if training is truly the issue—don’t use training as a Band-Aid for bad system design.
  3. Look beyond documented “standard-operating” procedures. As an example: The company implements a visual step-by-step guide near machines to ensure operators follow a standard calibration process.

CHECK – Evaluate the Results

  1. Track performance data to see if the changes have reduced errors.
  2. Get user feedback to ensure the new system is intuitive and efficient. For example, Error rates drop by 40%, but operators still struggle with a specific step—prompting another refinement.

ACT – Standardize & Scale

  1. If the improvement is successful, integrate it as the new standard process.
  2. Scale the change across other departments or sites where similar issues might exist. For example, the company implements the same calibration guide and training approach across all locations, preventing similar errors company-wide.

Conclusion: From Blame to Solutions

While human error is a reality, it’s often a symptom of a deeper process flaw, not the root cause. Those involved in conducting a root cause analysis process or investigation process, must ask “How did the system fail the individual” and “Why did the system fail the individual”. By shifting from a blame mindset to a continual improvement approach, organizations can:

  • Reduce costly errors and downtime
  • Improve employee engagement (less frustration = higher productivity)
  • Enhance conformity and compliance
  • Increase process reliability and efficiency

Monitoring the system will continue for as the context changes the controls implemented may not be as effective as before. A proactive system will not guarantee that things never go wrong. When they do, however, the key is to dig deeper. Using tools like PDCA, FMEA, and RCA will help in identifying long-term solutions to recurring problems. Because in most cases, fixing the system is better than blaming the human.

AS9100-Risk-Based Thinking in the Airline industry – It’s about time.

The airline industry statistically has one of the best safest records. AS9100 defines the framework for a quality management system for aerospace parts manufacturers across the globe. Over the past decade there have been several airline accidents however, that have brought the safety of airlines to the forefront. In a most recent case of the Boeing 737-max a software glitch was identified as the cause. As investigations proceed the general consensus is that this glitch should have been previously identified.

Risk generally is associated with ‘uncertainty’ or ‘negativity’. This changed with ISO 9001:2015 and the onset of risk-based thinking that now asks companies to consider the opportunities for improvement that may arise out of taking a ‘calculated’ risk. Further in AS9100, that is built on ISO 9001, there are requirements for consideration of strategic risks and operational risks and the need to take action to address each. The impact of coronavirus or a similar pandemic is a great example of a strategic risk that can affect business continuity.

Risk-based thinking in the AS9100 standard promotes customer focus within an organization. While risk-based thinking has been inherent in previous versions of the standard with preventive action, the new standards address risk at each stage of the PDCA cycle thus enabling the entire As9100 management system at each stage as a preventive tool.

The aerospace and automotive industry are leaders in the implementation of Failure mode and effects Analysis (FMEA) and the Plan-Do-Check-Act Cycle (PDCA) of process management.  Originally adopted by the military in the 1950’s, FMEA later was embraced by the auto and aerospace industries.  The FMEA process identifies risks that can then be addressed using mistake proofing and problem solving with a team approach.  FMEA can be used for either product or process. When used properly it can be a very effective at addressing risks. FMEA is a great core tool that can be applied to address the AS9100 clause 8.1.1 operational risk requirements.

AS9100 asks top management to take accountability for the quality of products and services produced by their organization; keeping a customer focus at the core of all they do.  The influence of end users, customers and the companies marketing department on the product’s design needs to be constantly reviewed. At each stage of the requirements gathering, design & development and manufacturing stages of the AS9100 system there are potential risks. As such doing a single FMEA may not be sufficient but may require a review of the FMEA at periodic intervals as a change in inputs to the process/product may change the associated risks or identify new ones.

Management wants to encourage continuous improvement and innovative recommendations by all stakeholders, but changes must be reviewed.  Whenever a change is made to a AS9100 certified product or service, that change should follow the PDCA Cycle approach, the same way it was done when the product was first introduced.  This will reduce the number of recalls, and the risk of injuries to end users of the products.

A single non-conforming product that goes out of the organization into the market results in an intangible loss for no value can be put on the loss of reputation. It only takes a single incident! Starting with risk appreciation at the Plan stage of the PDCA cycle and then throughout the rest of the cycle, with a focus on customer satisfaction, will help the aerospace industry improve by preventing non-conformities before they occur as well as hopefully, improve their As9100 certified products.

Monitoring Outsourced Processes is a Primary Responsibility of Every Organization

The international standards provide a world of wisdom enabling robust planning to achieve results by the organizations. In this global economy, often doing all the work in-house is not a cost-effective solution. Moreover, with super-specialized industry requirements, perhaps a lot of quality products and services can be procured at reasonable prices. Yet it seems organizations fail to act in the spirit of the standard when putting in place requirements for monitoring outsourced processes. Clause 8.1 of ISO 9001:2015 in operational planning and control has a sting in the tail with a clear whip requiring that “the organization shall ensure that outsourced processes are controlled.”

Statutory requirements are created to provide the required oversight, maintain customer focus and protect the interests of the customer when products and services are cleared for use. The caveat is that the statutory body should be well resourced, have the infrastructure, maintain organizational knowledge levels (Clauses 7.1.5.1, 7.1.3 & 77.1.6 of ISO 9001) with competent manpower (Clause 7.2). This often is not possible or with time not sustainable due to budgetary constraints, knowledge level dropping with time, Leadership forgetting their primary role (Clause 5.1.1) of taking accountability for the effectiveness of the QMS (Quality Management System). As such, the resources (5.1.1 e) needed for the QMS are not provided or budgets not available. The statutory bodies rationalize it by their helplessness since the government does not provide the funding and budgetary support for this.

Whatever the reasons, the question is who suffers? A ship is sunk, and aircraft with all on board has crashed, dangerous drugs are in use. It is the customer who suffers. In helplessness on their ability to do their duties, the statutory bodies outsource the work to contracted parties or worst to the manufacturer itself! The whole logic of creating a statutory body is lost with this.

What then is the remedy? The essential rulemaking that implements compliance requires competence, resources, and infrastructure with a committed Leadership ensuring continuing suitability, adequacy and effectiveness of the system. When budgetary constraints do not allow this role to be fulfilled, the risk to the system along with the products and services it provides must be assessed and mitigated or the opportunity for improvement taken (Clause 6.1 of the ISO 9001).  This would require the authority to appreciate the FMEA (Failure Mode Effect and Analysis) and take measures to remedy this. If this risk is not appreciated as NC (Non-conformity) the CA (Corrective Action) will not take place nor will the government know of the consequences of underfunding or of recognizing the failure and finding alternatives/ considering options. If the manufacturer has the resources, the government may consider this an asset and avoid duplication of resources, thinking in national terms. Outsourcing to the manufacturer as has been seen can mean losing customer focus and is strict counter to the very philosophy of statutory work. It would call for aggressive, proactive and strict monitoring of the outsourced processes.

In my opinion, monitoring the outsourced processes diligently, as clearly prescribed in the standard is the answer. New options may not be necessary, if the existing clauses of ISO 9001 and related industry-specific standards, where applicable, are understood in the spirit of the standard and vigorously implemented.

  • Dr. IJ Arora