XClose

In2Research Journeys

Home
Menu

Week 6 – Model Interpretability & Explainability

25 Jul 2025 - Hirra Asif - NLP, XAI, Interpretability

This week I finalised the performance results for both the machine learning baselines and the deep learning models. I then applied explainability techniques to the models with top metrics using LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations) to understand why they predict what they do and which words most influence each class.

I generated:

  • Instance level explanations with LIME 🍋‍🟩 and SHAP 🔴 to inspect individual predictions.
  • Global views with SHAP to see overall feature/token influence.
  • Token importance plots (top‑15) for each task: positive vs. negative on the COVID‑19 classifier, and ADR vs. non‑ADR on the ADR classifier.
  • SHAP force plots to visualise how specific tokens push predictions toward each class 📊

Challenges 🤔❓

  • Computing SHAP values across the dataset takes significantly longer than instance level LIME and standard evaluation, especially on transformer models⏳

Next steps ➡️

  • Data visualisations and start writing up the findings ✍️