Week 6 – Model Interpretability & Explainability

25 Jul 2025 - Hirra Asif - NLP, XAI, Interpretability

This week I finalised the performance results for both the machine learning baselines and the deep learning models. I then applied explainability techniques to the models with top metrics using LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations) to understand why they predict what they do and which words most influence each class.

I generated:

Instance level explanations with LIME 🍋‍🟩 and SHAP 🔴 to inspect individual predictions.
Global views with SHAP to see overall feature/token influence.
Token importance plots (top‑15) for each task: positive vs. negative on the COVID‑19 classifier, and ADR vs. non‑ADR on the ADR classifier.
SHAP force plots to visualise how specific tokens push predictions toward each class 📊

Challenges 🤔❓

Computing SHAP values across the dataset takes significantly longer than instance level LIME and standard evaluation, especially on transformer models⏳

Next steps ➡️

Data visualisations and start writing up the findings ✍️