Accelerating AI Adoption With Explainability in SaMD

Many recognize the potential of artificial intelligence (AI) in healthcare and life sciences. Some early adopters are starting to reap the benefits already, especially in fields such as radiology and digital pathology, where it is now more common for AI to be used to identify specific markers or patterns in medical images. However, the regulatory landscape on standardizing the pathway for how AI can be leveraged in this space is constantly evolving. There are still many unintended consequences resulting from the adoption of AI, including unmitigated bias.

Given these developments, one area in particular where industry leaders are focusing is the use of AI in the clinical decision support process, and specifically its intersection with the FDA’s regulatory policies on AI/ML-based Software as a Medical Device (SaMD).

In January, 2021, industry and regulatory leaders convened and drafted key recommendations and action plans to better frame regulatory needs in this area. One of the focal points was around “Good Machine Learning Practices”, what this entails, and how it can ensure quality systems around the entire development lifecycle of AI/ML-based SaMD from inception to development, deployment, and maintenance. Some may relate this to and be more familiar with the term “ML Ops” (Machine Learning Operations), of which this is broader in scope. Within this framework (“Figure 2” below), in the section on model validation (clinical evaluation), an area where research is significantly and rapidly advancing is explainable AI (“XAI”) and, in parallel, counterfactuals. This is the specific topic we will deep-dive into, since limited explainability of AI/ML “black box” algorithms is one of the biggest obstacles to AI adoption in SaMD and beyond today. Integrating XAI into the AI/ML-based SaMD lifecycle will help build trust and adoption of these solutions and help clinicians in the decision making process.

Four topics at the intersection of the SaMD ecosystem and XAI will be covered here:

  1. Where XAI fits within the SaMD action plan and FDA proposed framework
  2. Combining XAI and the rise of counterfactual
  3. New research frontiers within counterfactuals that can be incorporated into SaMD:
    1. Counterfactuals and Causal Inference (model level)
    2. Counterfactual Explanations (input/output level)
  4. Future considerations

XAI within SaMD

When we talk about where within this proposed FDA SaMD framework XAI fits in, the “Model Validation” box in the figure above is where I would place it. In particular, clinical evaluation and the subsection of analytical validation aligns well with the principles of XAI because XAI is all about bringing transparency and explainability to AI/ML models through multiple modalities --- process, algorithms, experience design, documentation, etc. By doing this, any solution and result should be interpretable by humans (see more here from our IEEE archive. This ultimately builds trust because we can understand how decisions are being made, and it allows us to mitigate unintended consequences like biased decision-making.

More details on the subcategories of clinical evaluation are shown below in Figure 3 of the FDA paper, where we can clearly see multiple levels of evaluation. It is important to differentiate these three because XAI can also be applied at different levels:

  • The first level is at the clinical association to make sure, medically, there is an output from an AI/ML model that is associated with the targeted clinical condition. Is there clinical accuracy and rigor?
  • The second is analytical validation, where we look for consistent, reliable outputs that we can track transparently throughout the process from feature engineering to prediction. Can we understand how A got to B, and is this scientifically reproducible/auditable?
  • The third is clinical validation, which looks at intent. Does the tool actually do what I intended it to do for my patients? Are there unwanted impacts?

For the components of XAI we will be looking at today (counterfactuals), we will be focused primarily on 2 and 3.


Image from FDA discussion paper. Link.

Combining XAI and the rise of Counterfactuals

As mentioned before, XAI has evolved into an umbrella term that addresses human-interpretable AI/ML across different modalities, including but not limited to processes, documentation, experience design, and models that target all parts of the AI/ML development lifecycle. Over the last couple of years, most major AI companies have adopted some perspective of XAI. One example is IBM’s diagram below:


Image source: IBM AI Explainability Blog. Link here.

One thing that can fall within this umbrella is a quickly evolving and advancing area of “counterfactuals”, which originates from and is an established logical paradigm from philosophy that essentially explores what would have been true given a set of different circumstances (see full definition here). These concepts, while not new, are more recently being integrated with AI/machine learning and computer science paradigms in computationally efficient ways that can be applied at multiple levels --- hence the differentiation earlier.

As applied to AI/ML, counterfactuals can help us programmatically bring transparency at multiple levels to a given prediction. Let’s pretend we are working with an AI/ML-based SaMD application that is used for clinical decision support using an algorithm that predicts whether a patient should be recommended to a specialist or not based on their profile, medical history, and medical images.

In this situation, let’s assume there is some class imbalance, where the majority of patients (and therefore the data) fit into a certain type of demographic profile, socioeconomic class, or other grouping.

Done incorrectly with a black box algorithm, we might have high accuracy in matching human recommendations for the majority group, but then could be making incorrect biased decisions for anyone else and have no way of knowing what is driving the prediction. Why is the recommendation yes or no? Which data points caused it? Are there hidden relationships or bias in or amongst the data/variables? Which inputs could we change to get a different outcome?

New research frontiers within Counterfactuals

Given the scenario above, there are two recent research publications that I wanted to highlight from the RAI research network that could improve explainability and transparency of AI/ML solutions in new ways. RAI is already investing in similar applications of counterfactuals in financial services, so it is not a stretch to believe this could also benefit AI/ML solutions in SaMD and increase adoption of these clinical decision support tools.

Counterfactual fairness at the model level The first counterfactual research applies to a scenario common in healthcare and life sciences, where we remove identifying information to comply with HIPAA, GDPR, CCPA, and many other compliance frameworks to ensure no sensitive data is compromised to protect patient privacy. However, there are many cases where simply removing gender, name, race, etc., may not mitigate algorithmic bias because other proxy data/variables may remain. This can lead to biased patterns and results being systematically incorporated into analyses and predictions, despite the fact that we removed all sensitive information.

This research from Zenna et. al. explores a novel way to capture these causal relationships at the model level, using a combination of multiple techniques including statistics, probabilistic programming, causal inference, and others to capture algorithmic fairness, such that models are counterfactually fair from the beginning, even during the training process. Applying this research successfully could help AI/ML engineers, data scientists, or end users identify earlier in the development life cycle which variables could be causing bias by proxy, despite their removal from the base dataset.

Counterfactual explainability at the input/output level

The second counterfactual research applies to another common scenario. Imagine we receive a model prediction that says a patient should not be recommended to a specialist given a set of inputs (e.g., medical images, medical history, etc. as in the example above). If using an opaque model like deep learning or a managed service 3rd party API, clinicians will have questions around how the prediction was made, why, and what an alternative recommendation might look like.

This research from Max et. al. provides a package (“GeCo”) that provides this type of explainable “what-if” analysis at the output level in a computationally efficient way. It uses a novel way of programming complex constraints to achieve this and more. Not only can it give clinicians or end users an alternate recommendation, but it can provide one that is feasible and identify when models provide nonsensical (by human logic) outputs. For instance, using the same example, the input change required to get a “yes, specialist needed” recommendation might be to change gender. This is clearly not a feasible short-term change and would not be included in the alternate recommendation to get to a “yes”, but would still also expose potential problems with a given model. This addresses more than many techniques available today, such as LIME or SHAP.

Future Considerations

AI/ML-based SaMD has a massive potential to help people at all points across the healthcare and life sciences ecosystem. However, lack of transparency and the ability to audit the entire decision-making process has traditionally been a challenge in driving adoption of these solutions. This is especially true when it comes to implementing “good machine learning practices” from a regulatory standpoint. Black box solutions have proven to have too many unintended consequences, making it difficult to have any trust in these systems. To mitigate any negative impacts of using AI/ML-based SaMD, we should consider the following:

  • Incorporate XAI into any “good machine learning practice”, especially in the clinical evaluation process component. Specifically, incorporate counterfactuals to address the three pillars of valid clinical association, analytical validation, and clinical validation within the FDA AI/ML-based SaMD framework.
  • Implement counterfactual counterparts with any AI/ML model. Both counterfactual fairness (uncovering hidden relationships between variables and biased patterns through causal relationships in data) and counterfactual explainability (what-if analysis regardless of AI/ML model that helps determine what an alternative outcome looks like and the associated path of recourse) should be leveraged.
  • Support of scientific research into this area should be encouraged across regulatory agencies, industry associations, and practitioners to advance the development of counterfactuals and XAI to bring more health equity to all patients we are serving.


FDA. Artificial Intelligence and Machine Learning in Software as a Medical Device. January 2021. Link.

FDA. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) - Discussion Paper and Request for Feedback. April 2019. Link.

FDA. Action Plan. January 2021. Link.

FinRegLab. Explainability and Fairness of Machine Learning in Credit Underwriting. 2021. Link.

Zenna Tavares, James Koppel, Xin Zhang, Ria Das, Armando Solar Lezama. A Language for Counterfactual Generative Models. 2021. Link.

Maximilian Schleich, Zixuan Geng, Yihong Zhang, and Dan Suciu. GeCo: Quality Counterfactual Explanations in Real Time. 2021. Link.

Explainable artificial intelligence: A survey. 2018. Link.

P2976 - Standard for XAI – eXplainable Artificial Intelligence - for Achieving Clarity and Interoperability of AI Systems Design. 2021. Link.

Stanford dictionary. Counterfactual Theories of Causation. January 2001. Link.

IBM. AI Explainability Blog. Link.