Understanding Facets of Instance Level Effects in Explainable Artificial Intelligence Tasks using Shapley Values

Liu, Tommy

Understanding Facets of Instance Level Effects in Explainable Artificial Intelligence Tasks using Shapley Values

Date

2026

Authors

Liu, Tommy

Abstract

Explainable Artificial Intelligence (XAI) is reshaping the machine learning (ML) landscape and is a driving factor behind the adoption of these methods across the sciences. However, XAI research typically focuses on feature level explanations. In contrast, the instance level task of understanding the effects of instances upon a model are equally important. In the cases where instances are determined to be important, why they are important and how this importance manifests is relatively unexplored. The aim of this PhD thesis in computer science was to extend and develop methods to explain relationships between individual data instances and the model, in the context of Shapley Values. In particular, to build upon existing works to provide methods that offer justification of what made each data instance important and to use this information to generate actionable feedback to an ML workflow. This information can be used to critically analyse the model fitting process, and to inform future data acquisition to ensure that expensive and time consuming experiments are focused on data that will tangibly improve the models built for that application. A facet defines a distinct feature or component of a larger problem. This thesis explores three facets of what makes data important, where each facet represents a dimension or summary of how an individual instance impacted an ML model or data transformation. This thesis demonstrates how these methods can be applied in the materials and health spaces, by taking a project from theory, implementing methods by extending and building upon existing literature, and finally, demonstrating the impactful insights that can be derived from materials and health datasets. Chapters 4, and 5 demonstrate how breaking down existing concepts in instance importance can uncover facets of data importance. In particular, Chapter 4 introduces RSHAP which breaks down the instance contribution to loss, further into the instance contribution to the residuals of a model. Since loss terms are typically a function of the residual values, RSHAP provides a lower-level view of how data interacts with the model and each other. RSHAP was demonstrated to be particularly effective in materials datasets by quantifying that certain elements can have different impacts across the spectrum of elements present in the data. Chapter 5 demonstrates how breaking the loss term down into bias and variance can result in different types of instance importance. In particular, two equally important instances when talking about loss may affect bias and variance in different ways. These two quantities generate an axis of importance where data can fall into certain quadrants and presents an opportunity to distinguish the different impacts of data. Chapter 6 introduces what are known as behavioural space transformations, these transform- ations connect the XAI concept of Shapley Values with interpretable data transformations. By visualising patterns that emerge under these interpretable transformations, relationships can be inferred by identifying outliers or trends in the data. This thesis demonstrates that facets of data importance are useful to have in a data analysis tool kit and that it is a promising direction for future research. In particular, these kinds of method contribute to the understanding phase of XAI workflows by making us critically question what our models are doing with the data.