I'm leveraging the Python packages lime and shap to explain single (test-set) predictions that a basic, trained model is making on new, tabular data. WLOG, the explanations generated by both methods do not agree with user intuition.
For example, when leveraging the methods in a healthcare setting, they might list the presence of a comorbidity (a disease that frequently co-occurs with the outcome disease of interest) as a factor that decreases a patient's risk of an adverse event.
Intuitively, such behavior is incorrect. We shouldn't see that history of heart attacks lowers the risk of adverse events, for example. What are some reasons that we might see these inconsistencies?
Some of my ideas
- Class label imbalance: tried balancing the dataset, did not solve the issue
- Kernel width for LIME: working on tuning this, but cursorily, no benefit
- Relationship to training data: for tabular data, both
limeandshaprequire the training dataset as input to build the explainer class. If there are instances in which a feature such as history of heart attacks were associated with a no adverse event outcome, such instances would "confuse" the methods, so to speak. However, I'm not sure I have the intuition correct there. - Error in understanding on my part: there may be nuances in intuition here that I've missed. Specifically, I am trying to make sure I correctly understand the relationship between the generated explanations and the training dataset used to build them.