Whitening a black box: how to interpret a ML model¶

%%HTML
<style>
div.prompt {display:none}
</style>

Investigated methods¶

Partial Dependency Plots (PDP): theory, PDPBox and on GitHub
Individual Conditional Expectation (ICE): theory, PDPBox and on GitHub
Permutation Feature Importance: theory, eli5
Trees Ensembles Features Contribution: theory, eli5
Local interpretable model-agnostic explanations (LIME): theory, theory, tutorial, GitHub
SHAP (SHapley Additive exPlanations): theory, theory, shap, very good intro level blog post

Useful links¶

Awesome blog post on Interpreting a ML model: here
Adult dataset description: here
Adult dataset modeling (R): here

Still unstable methods¶

Accumulated Local Effects (ALE) Plots (pip version not up-to-date; author admitting library needs more work): theory, ALEPython
Scoped Rules (Anchors) from the creators of LIME (pip version not up-to-date; issues with outdated dependencies, e.g sklearn): theory, GitHub

Note¶

I really struggled to have the sklearn, XGBoost and CatBoost APIs function smoothly with the PDPBox, eli5, lime and shap packages. Some just accept pd.DataFrame but then the explanation part is compatible with np.array only, or the other way around. Also, not all classifiers output ptobabilities by default. For XGBoost a score is produced and it needs to be fed into a sigmoid function to convert it into a probability. Nothing dramatic per se. It just means some extra care must be used when turning towards those methods.

For the time being, I managed to get both sklearn and XGBoost to work with all the relevant explanation-related packages, with the caveat that PDPBox only accepts models tained on pd.DataFrame. For CatBoost, instead, I could not get a lime pipeline to work, even though shap does just fine.

References¶

Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019. 
https://christophm.github.io/interpretable-ml-book/.

Imports¶

%load_ext autoreload
%autoreload 2
%matplotlib inline

from interpretable_ml_utils import *
shap.initjs()

path = Path("C:/Users/francesco.pochetti/Notebooks/data/Varied")

Loading and processing the UCI adult dataset¶

fname = path/"adult.feather"
if not os.path.isfile(fname):
    cols = ['age', 'workclass', 'fnlwgt', 'education', 'education_num', 
            'marital_status', 'occupation', 'relationship', 'race', 'sex', 
            'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income']
    df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', 
                     names=cols)
    df.to_feather(fname)
else: 
    df = pd.read_feather(fname)

df.shape

(32561, 15)

df.head()

missing_values(df)

features = ['age', 'workclass', 'education', 'marital_status', 'occupation', 
            'relationship', 'race', 'sex', 'hours_per_week', 'native_country',
            'capital_gain', 'capital_loss']

for c in features+['income']:
    if not is_numeric_dtype(df[c]):
        df[c] = df[c].str.strip()
        df = reduce_cardinality(df, c)

raw_train, raw_valid = train_test_split(df[features+['income']], train_size=0.8, random_state=1234)
raw_train.shape, raw_valid.shape

((26048, 13), (6513, 13))

display_all(raw_train.head())

One-hot-encoding categories (needed for PDPs) and splitting data into train/validation¶

cats = train_cats(raw_train)  
X_train, y_train, nas = proc_df(raw_train, y_fld='income', max_n_cat=15)

apply_cats(raw_valid, raw_train)
X_valid, y_valid, _ = proc_df(raw_valid, y_fld='income', na_dict=nas, max_n_cat=15)
feats = list(X_train.columns)

raw_train.shape, raw_valid.shape, X_train.shape, X_valid.shape

((26048, 13), (6513, 13), (26048, 56), (6513, 56))

cats['workclass']

Training model on one-hot-encoded data¶

clf = xgb.XGBClassifier(learning_rate=0.03, n_estimators=300, n_jobs=-1)
clf.fit(X_train, y_train)
_ = print_report(clf, X_valid, y_valid, t=0.5, cal=False, roc=False, 
                 conf=False, classes=cats['income'])

#clf = RandomForestClassifier(n_estimators=150, min_samples_leaf=5, n_jobs=-1)
#clf.fit(X_train, y_train)
#_ = print_report(clf, X_valid, y_valid, t=0.5, cal=False, roc=False, 
#                 conf=False, classes=cats['income'])

AUC valid: 0.9069511545106526 
F1 valid: 0.649270913277053
              precision    recall  f1-score   support

           0       0.87      0.95      0.91      4985
           1       0.78      0.55      0.65      1528

    accuracy                           0.86      6513
   macro avg       0.83      0.75      0.78      6513
weighted avg       0.85      0.86      0.85      6513

Permutation-based feature importance¶

The questions this approach answers are:

How much does the performance of the model decrease on average when we random shuffle one feature at a time (Y-axis in the chart below shows AUC)?
Which features affect performance the most?

Advantages (source)¶

Nice interpretation: Feature importance is the increase in model error when the feature’s information is destroyed.

The importance measure automatically takes into account all interactions with other features.

Disadvantages (source)¶

It is very unclear whether you should use training or test data to compute the feature importance.

If features are correlated, the permutation feature importance can be biased by unrealistic data instances.

Adding a correlated feature can decrease the importance of the associated feature by splitting the importance between both features.

try: imp = get_permutation_imp(clf, X_train, y_train, feats=feats)
except: imp = get_permutation_imp(clf, X_train.values, y_train, feats=feats)

Gain-based feature importance¶

Methodology currently implemented in scikit-learn. Source SO

It is sometimes called "gini importance" or "mean decrease impurity" and it is defined as the total decrease in node impurity (weighted by the probability of reaching that node (which is approximated by the proportion of samples reaching that node)) averaged over all trees of the ensemble

Advantages¶

Compared to permutation-based feature importance, this method comes at no additional cost (the gains are directly calculated as part of training) and does not suffer from the drawbacks of having potentially correlated features, as datapoints are not shuffled.

Disadvantages¶

Probably the biggest disadvantage is the fact that the absolute importances are less interpretable, compared to the more concise and transparent increase in error rate we had before.

Note¶

Note that, as expected, the two illustrated feature importance methods yield slightly different results in terms of the ordering the top variables.

get_styled_feat_importance(clf, feats, 15)

Partial Dependency Plots (PDP) + Individual Conditional Expectation (ICE)¶

The idea behind PDP is to check the relationship between the dependent variable and a feature of interest, in an approach allowing to isolate the dependence of the output wrt an input, all other things being equal.

This is achieved in the following way:

For a numerical variable, select N values which split the feature in quantile-spaced or equi-distance-spaced intervals. For a categorical variable, it suffices to identify the unique levels present in the dataset.
For each one of the N values, e.g. x:
1. Replace the entire feature in the dataset with x, keeping the other variables fixed.
2. Run the newly created dataset through the model and record predictions.
Plot predictions (Y-axis) over the N values in #1 (X-axis).

Say that your dataset is composed of n data points. Each time you run 2B, you'd get n predictions, e.g. you'd come up with N x n predictions at the end of the procedure.

Now, you have two options:

Plotting n lines over N data points on the X-axis. This is what ICE plots look like.
Plotting 1 average line over N data points on the X-axis. This is what PDP plots look like.

The charts below show both all lines and the average at the same time.

Advantages (source, source)¶

The computation of partial dependence plots is intuitive: the partial dependence function at a particular feature value represents the average prediction if we force all data points to assume that feature value.

If the feature for which you computed the PDP is not correlated with the other features, then the PDPs perfectly represent how the feature influences the prediction on average. In the uncorrelated case, the interpretation is clear.

Unlike partial dependence plots, ICE curves can uncover heterogeneous relationships.

Disadvantages (source, source)¶

The realistic maximum number of features in a partial dependence function is two.

The assumption of independence is the biggest issue with PD plots.

To meaningfully plot both PDPs and ICE for categorical variables, the variable must be one-hot-encoded. This is a constraint as tree-ensemble algorithms (the de-facto go-to solution in the non-deep-learning world) can easily handle features which are not encoded in this way.

WARNING! As you notice from the first plot below (capital_gain), the technique does not work for highly skewed features. I could not manage to get PDPBox to reduce the range of the variable in question, cutting high quantiles.

plot_pdp(clf, X_valid.sample(2000), feats, 'capital_gain')

plot_pdp(clf, X_valid.sample(2000), feats, 'age', center=False)

plot_pdp(clf, X_valid.sample(2000), feats, 'hours_per_week')

cat = 'marital_status'
f = [c for c in feats if c.startswith(cat)]
if len(f)>1: plot_pdp(clf, X_valid.sample(2000), feats, f, classes=cats[cat], plot_params={'title': f'PDP for feature {cat}'}, center=False)

cat = 'occupation'
f = [c for c in feats if c.startswith(cat)]
if len(f)>1: plot_pdp(clf, X_valid.sample(2000), feats, f, classes=cats[cat], plot_params={'title': f'PDP for feature {cat}'}, center=False)

Training model on NOT one-hot-encoded data and...¶

cats = train_cats(raw_train)
X_train, y_train, nas = proc_df(raw_train, y_fld='income', max_n_cat=None)

apply_cats(raw_valid, raw_train)
X_valid, y_valid, _ = proc_df(raw_valid, y_fld='income', na_dict=nas, max_n_cat=None)
feats = list(X_train.columns)

raw_train.shape, raw_valid.shape, X_train.shape, X_valid.shape

((26048, 13), (6513, 13), (26048, 12), (6513, 12))

Notice that in the below cell, in the case of XGBoost, we call clf.fit(X_train.values, y_train), e.g. we fit on np.arrays, and not on pd.DataFrame. This is due to downstream-used packages (lime, eli5, shap) not always being compatible with pandas and XGBoost or CatBoost. The sklearn API is always compatible with pandas, though.

clf = xgb.XGBClassifier(learning_rate=0.03, n_estimators=300, n_jobs=-1)
clf.fit(X_train.values, y_train)
_ = print_report(clf, X_valid.values, y_valid, t=0.5, cal=False, roc=False, 
                 conf=False, classes=cats['income'])

#clf = RandomForestClassifier(n_estimators=150, min_samples_leaf=5, n_jobs=-1)
#clf.fit(X_train, y_train)
#_ = print_report(clf, X_valid, y_valid, t=0.5, cal=False, roc=False, 
#                 conf=False, classes=cats['income'])

AUC valid: 0.9047545778697349 
F1 valid: 0.6396327467482785
              precision    recall  f1-score   support

           0       0.87      0.95      0.91      4985
           1       0.77      0.55      0.64      1528

    accuracy                           0.86      6513
   macro avg       0.82      0.75      0.77      6513
weighted avg       0.85      0.86      0.85      6513

... showing feature importances as before¶

imp = get_permutation_imp(clf, X_train.values, y_train, feats=feats)

get_styled_feat_importance(clf, feats)

Features contributions to individual predictions: the treeinterpreter approach¶

The following approach is not known in the community under a specific name and it does not come with a paper.

Even though not highly advertised, it is incredibly powerful, and got integrated (in an independent implementation) within the eli5 library.

Ando Saabas proposed the method in this series of posts. The idea is that we can decompose the contribution of each feature to a specific prediction by following the path from the root of the tree to the node where the instance of interest resides. This is done by recording, at each split of the feature space, the difference in the proportion of the positive class (in case of classification) between the parent and the child node and assigning that difference to the variable in question. If we start from the root, the percentage of the positive class would just represent the average across the entire training set. Let's assume this is 70%. Now assume that the first split happens on feature X_1, that our instance of interest sits on the right branch of the tree and that within this subset the percentage of the positive class is 80%. Then we can safely state that X_1 drives an increase in probability of 10%. Imagine doing this until the tree is fully grown and averaging that out across all trees (in case of a forest).

This is how the method works in a nutshell.

The contributions, by definition, add up to the final model's outcome, making them easy to interpret and to visualize. Both a tabular form, as eli5 proposes, and a more business-oriented waterfall chart, are nice ways of doing so.

Note¶

This method relies on the metrics which get actually computed at each tree's node.

In case of XGBoost, the algorithm calculates a score which cannot direcly be interpreted as a probability, as happening in RandomForest, so beware of the numbers eli5 displays.

# id = X_valid.sample().index.values[0]

# fixing the ID value for illustration purposes
id = 20150 
explain_pred_contrib(id, clf, X_valid, feats, cats=cats, waterfall={'rotation_value':60, 'threshold': None})

Prediction explanation for ID: 20150; Probability of event (y=1): 0.6039999723434448
Model used: <class 'xgboost.sklearn.XGBClassifier'>

Local interpretable model-agnostic explanations (LIME)¶

The principle behind LIME is based on fitting a local explainable model (e.g. a linear regression for instance) on the predictions of the black-box model. First thing to notice, LIME does not fit on the actuals, but on the black-models outputs. Those are the ones we have to explain, after all.

The recipe for training local surrogate models:

Select your instance of interest for which you want to have an explanation of its black box prediction.

Perturb your dataset and get the black box predictions for these new points.

Weight the new samples according to their proximity to the instance of interest.

Train a weighted, interpretable model on the dataset with the variations.

Explain the prediction by interpreting the local model.

Source

Advantages (source)¶

Explanations are short (= selective) and possibly contrastive.

It works for tabular data, text and images.

The fidelity measure (how well the interpretable model approximates the black box predictions) gives us a good idea of how reliable the interpretable model is.

Disadvantages (source)¶

The correct definition of the neighborhood is a very big, unsolved problem when using LIME with tabular data. In my opinion it is the biggest problem with LIME and the reason why I would recommend to use LIME only with great care. [...] LIME currently uses an exponential smoothing kernel to define the neighborhood. A smoothing kernel is a function that takes two data instances and returns a proximity measure. The kernel width determines how large the neighborhood is: A small kernel width means that an instance must be very close to influence the local model, a larger kernel width means that instances that are farther away also influence the model. If you look at LIME’s Python implementation (file lime/lime_tabular.py) you will see that it uses an exponential smoothing kernel (on the normalized data) and the kernel width is 0.75 times the square root of the number of columns of the training data.

Another really big problem is the instability of the explanations. In an article the authors showed that the explanations of two very close points varied greatly in a simulated setting.

Conclusion: Local surrogate models, with LIME as a concrete implementation, are very promising. But the method is still in development phase and many problems need to be solved before it can be safely applied.

WARNING! The below explanation makes, unfortunately, no sense, showing LIME's limitations. I decided to keep it for illustration purposes.

When asking to explain an instance, the LIME library produces 3 outputs (see below):

Left (Output Probabilities): those are the probabilities of the event (>50K) and non-event (<=50K) occuring.
Center: those are the top 5 reasons behind the prediction. As also discussed in this tutorial, the purpose of the bar chart is to show which variables are pushing the prediction in the direction of which class. In the case of the first blue capital_gain bar extending to the left of the black vertical line, if capital_gain would not be <=0 (it is 0 for this specific data point), the probability of the event >50K would be 0.63 higher. Differently stated, capital_gain<=0 pushes the prediction towards the non-event by 0.63. The second orange bar, instead, can be interpreted as: if marital_status would have been different than Married-civ-spouse, the probability of the event would be 0.13 lower, so this feature is acting in favour of >50K. Note that this explanation contradicts the prediction of the person earning more than 50K happening with probability 60%. Another resource to look at to understand how to interpret those numbers is the official LIME Github repo: the screenshot sections provides a couple of interesting examples.
Right: those are just the actual values the top 5 features hold for the instance in question.

An interesting fact to call out is the significant difference between LIME local prediction and the black-box model prediction. There is no guarantee those two would be equal, but still, such a big difference should hold us back from trusting the explanations as they might be inconsistent, as is actually the case.

explainer = get_lime_explainer(X_train, cats, feats, kernel_width=None)

exp = explain_pred_lime(id, clf, explainer, X_valid, num_features=5)

Model used: <class 'xgboost.sklearn.XGBClassifier'>
LIME local prediction: 0.2846695468893861; Model prediction: 0.6038179397583008; R2 score: 0.6850372854111757

Fiddling around with kernel_width to try to get better explanations...

If we reduce the kernel_width to 0.5 things improve a lot

explainer = get_lime_explainer(X_train, cats, feats, kernel_width=0.5)
exp = explain_pred_lime(id, clf, explainer, X_valid, num_features=5)

Model used: <class 'xgboost.sklearn.XGBClassifier'>
LIME local prediction: 0.4140323350257734; Model prediction: 0.6038179397583008; R2 score: 0.3646826645193938

#if 'XGB' in str(type(clf)): print(measure_lime_accuracy(clf, explainer, X_valid.sample(2000), num_features=5))

SHAP (SHapley Additive exPlanations)¶

Quotes taken from here and here

Goal

The goal of SHAP is to explain the prediction of an instance x by computing the contribution of each feature to the prediction. The SHAP explanation method computes Shapley values from coalitional game theory.[...] A prediction can be explained by assuming that each feature value of the instance is a “player” in a game where the prediction is the payout. Shapley values [...] tells us how to fairly [e.g. in an additive fashion] distribute the “payout” among the features.

Conditions Shapley values meet

source Consider the following scenario: a group of people are playing a game. As a result of playing this game, they receive a certain reward; how can they divide this reward between themselves in a way which reflects each of their contributions? There are a few things which everyone can agree on; meeting the following conditions will mean the game is fair according to Shapley values:

[Efficiency] The sum of what everyone receives should equal the total reward

[Symmetry] If two people contributed the same value, then they should receive the same amount from the reward

[Dummy] Someone who contributed no value should receive nothing

[Additivity] If the group plays two games, then an individual’s reward from both games should equal their reward from their first game plus their reward from the second game

Intuition

An intuitive way to understand the Shapley value is the following illustration: The feature values enter a room in random order. All feature values in the room participate in the game (= contribute to the prediction). The Shapley value of a feature value is the average change in the prediction that the coalition already in the room receives when the feature value joins them.

source Shapley values calculate the importance of a feature by comparing what a model predicts with and without the feature. However, since the order in which a model sees features can affect its predictions, this is done in every possible order, so that the features are fairly compared.

Interpretation

The interpretation of the Shapley value for feature value j is: The value of the j-th feature contributed ϕj to the prediction of this particular instance compared to the average prediction for the dataset.

Be careful to interpret the Shapley value correctly: The Shapley value is the average contribution of a feature value to the prediction in different coalitions. The Shapley value is NOT the difference in prediction when we would remove the feature from the model.

What does the SHAP package offer?

Single instance force plots
Each feature value is a force that either increases or decreases the prediction. The prediction starts from the baseline. The baseline for Shapley values is the average of all predictions. In the plot, each Shapley value is an arrow that pushes to increase (positive value) or decrease (negative value) the prediction. These forces balance each other out at the actual prediction of the data instance.
Aggregated SHAP-based feature importance: sort features based on mean absolute Shapley values.
Instance-level SHAP-based feature importance: sort features by aggregated SHAP-based importance and show instance-granular Shapley values dispersed on the y-axis.
Clustering instances based on Shapley values
Dependence plots: showing relationship between Shapley values of feature j and values of feature j, with additional possibility to colour code by the intensity of a second variable.

Advantages (source)¶

The difference between the prediction and the average prediction is fairly distributed among the feature values of the instance [e.g. it mst add up to the that difference] – the Efficiency property of Shapley values. This property distinguishes the Shapley value from other methods such as LIME. LIME does not guarantee that the prediction is fairly distributed among the features.

The Shapley value is the only explanation method with a solid theory. The axioms – efficiency, symmetry, dummy, additivity – give the explanation a reasonable foundation. Methods like LIME assume linear behavior of the machine learning model locally, but there is no theory as to why this should work.

Disadvantages (source)¶

The Shapley value requires a lot of computing time.

The Shapley value can be misinterpreted. The Shapley value of a feature value is not the difference of the predicted value after removing the feature from the model training. The interpretation of the Shapley value is: Given the current set of feature values, the contribution of a feature value to the difference between the actual prediction and the mean prediction is the estimated Shapley value.

shap_explainer = get_shap_explainer('tree', clf)
shap_values = get_shap_values(shap_explainer, X_valid)

Model used: <class 'xgboost.sklearn.XGBClassifier'>

SHAP force plots¶

The red "forces" in the below plot push the probability of the event high, whereas the "blue" ones work against them pushing in the opposite direction. Starting from the base_value, all forces add up to the final prediction.

# `link='logit'` only for classifiers such as XGBoost not producing probs directly
show_shap(shap_explainer.expected_value, shap_values, X_valid, id, cats=cats, 
          link='logit', matplotlib=True, feats=feats, clf=clf)

Printing top/bottom 5 features by SHAP values
Predicted probability of event: 0.6038179397583008

Training a CatBoost classifier¶

cat_cols = [c for c in cats.keys() if c!='income']

train_data = Pool(
    data=X_train,
    label=y_train,
    cat_features=cat_cols
)
holdout_data = Pool(
    data=X_valid,
    label=y_valid,
    cat_features=cat_cols
)

clf_c = CatBoostClassifier(
    iterations=100,
    random_seed=random_state,
    learning_rate=0.1,
    custom_loss=['AUC']
)

_ = clf_c.fit(train_data, eval_set=holdout_data, plot=True, verbose=False)

#clf_c.fit(X_train, y_train, cat_features=cat_cols, eval_set=(X_valid, y_valid), plot=True, verbose=False)

_ = print_report(clf_c, X_valid.values, y_valid, t=0.5, cal=False, 
                 roc=False, conf=False, classes=cats['income'])

AUC valid: 0.9094371465180883 
F1 valid: 0.6729996346364633
              precision    recall  f1-score   support

           0       0.89      0.94      0.91      4985
           1       0.76      0.60      0.67      1528

    accuracy                           0.86      6513
   macro avg       0.82      0.77      0.79      6513
weighted avg       0.86      0.86      0.86      6513

shap_explainer = get_shap_explainer('tree', clf_c)
shap_values = get_shap_values(shap_explainer, holdout_data)

Model used: <class 'catboost.core.CatBoostClassifier'>

SHAP Feature Importance ¶

shap_imp = get_shap_feat_importance(shap_values, holdout_data, feats)

SHAP Summary Plot ¶

get_shap_feat_importance(shap_values, X_valid)

show_shap(shap_explainer.expected_value, shap_values, X_valid, id, cats=cats, 
          matplotlib=True, link='logit', feats=feats, clf=clf_c)

Printing top/bottom 5 features by SHAP values
Predicted probability of event: 0.7942827265979207

clf_c.predict_proba(X_valid.loc[id].values)[1]

0.7942827265979207

Checking additivity of Shapley values

assert_shap_proba(clf_c, shap_explainer, shap_values, X_valid, id)

(0.7942827265979204, 0.7942827265979207)

SHAP Clustering ¶

show_shap(shap_explainer.expected_value, shap_values, X_valid, id=None)

SHAP Dependence Plot + SHAP Interaction Values ¶

show_shap(shap_explainer.expected_value, shap_values, X_valid, id=None, x="hours_per_week", 
          y='sex', disp=raw_train)

	29187	23535	14450	2727	18474
age	31	59	21	32	20
workclass	Private	?	Private	Private	Private
education	HS-grad	Bachelors	HS-grad	HS-grad	Some-college
marital_status	Divorced	Married-civ-spouse	Never-married	Never-married	Never-married
occupation	Other-service	?	Adm-clerical	Adm-clerical	Adm-clerical
relationship	Unmarried	Husband	Not-in-family	Not-in-family	Own-child
race	White	White	White	White	White
sex	Female	Male	Female	Female	Female
hours_per_week	99	40	40	40	20
native_country	United-States	United-States	United-States	United-States	United-States
capital_gain	0	0	0	0	0
capital_loss	0	0	0	0	0
income	>50K	>50K	<=50K	<=50K	<=50K

	feature	weight	std
0	marital_status_Married-civ-spouse	0.108362	0.001111
1	capital_gain	0.049074	0.000693
2	age	0.037389	0.001519
3	hours_per_week	0.013571	0.000667
4	capital_loss	0.008937	0.000274
5	occupation_Exec-managerial	0.005758	0.000382
6	education_Bachelors	0.005716	0.000193
7	occupation_Prof-specialty	0.004898	0.000319
8	occupation_Other-service	0.002495	0.000199
9	education_Prof-school	0.002048	0.000153
10	education_Masters	0.001880	0.000165
11	relationship_Own-child	0.001627	0.000279
12	education_HS-grad	0.001546	0.000145
13	occupation_Farming-fishing	0.001539	0.000127
14	education_7th-8th	0.001523	0.000194

Weight	Feature
0.2949	marital_status_Married-civ-spouse
0.0747	occupation_Prof-specialty
0.0608	capital_gain
0.0543	occupation_Exec-managerial
0.0396	education_Bachelors
0.0365	education_Masters
0.0316	occupation_Other-service
0.0267	hours_per_week
0.0263	education_HS-grad
0.0241	relationship_Own-child
0.0240	capital_loss
0.0220	age
0.0214	education_Prof-school
0.0213	occupation_Farming-fishing
0.0193	education_7th-8th
… 41 more …

	feature	weight	std
0	capital_gain	0.051877	0.001179
1	marital_status	0.048065	0.001179
2	age	0.040742	0.000520
3	education	0.021083	0.000325
4	occupation	0.016995	0.000420
5	hours_per_week	0.013954	0.000576
6	relationship	0.010013	0.000934
7	capital_loss	0.009647	0.000272
8	sex	0.002908	0.000149
9	workclass	0.001521	0.000091
10	native_country	0.000271	0.000014
11	race	0.000152	0.000030

Contribution^?	Feature	Value
+0.549	relationship: Husband	0.000
+0.434	marital_status: Married-civ-spouse	1.000
+0.393	education: Masters	7.000
+0.327	age	48.000
+0.296	occupation: Exec-managerial	3.000
+0.026	workclass: Private	2.000
+0.006	native_country: United-States	6.000
+0.003	race: White	4.000
-0.028	sex: Male	1.000
-0.060	hours_per_week	40.000
-0.078	capital_loss	0.000
-0.172	capital_gain	0.000
-1.275	<BIAS>	1.000

	age	workclass	fnlwgt	education	education_num	marital_status	occupation	relationship	race	sex	capital_gain	hours_per_week	native_country	income
0	39	State-gov	77516	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	2174	40	United-States	<=50K
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	13	United-States	<=50K
2	38	Private	215646	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	40	United-States	<=50K
3	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	40	United-States	<=50K
4	28	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	40	Cuba	<=50K

Weight	Feature
0.3290	relationship
0.1466	marital_status
0.1243	capital_gain
0.0928	occupation
0.0592	education
0.0551	hours_per_week
0.0540	capital_loss
0.0494	age
0.0404	sex
0.0260	workclass
0.0152	native_country
0.0079	race

	SHAP value	Feat value
marital_status	0.488455	Married-civ-spouse
relationship	0.447789	Husband
education	0.373692	Masters
age	0.368265	48
occupation	0.320931	Exec-managerial
...	...	...
workclass	-0.006547	Private
sex	-0.010044	Male
hours_per_week	-0.064090	40
capital_loss	-0.066898	0
capital_gain	-0.162112	0

	SHAP value	Feat value
marital_status	1.147313	Married-civ-spouse
education	0.853373	Masters
occupation	0.812718	Exec-managerial
age	0.691886	48
sex	0.108034	Male
...	...	...
workclass	0.008833	Private
race	0.007692	White
hours_per_week	-0.042783	40
capital_loss	-0.062791	0
capital_gain	-0.206041	0

Whitening a black box: how to interpret a ML model¶

Investigated methods¶

Useful links¶

Still unstable methods¶

Note¶

References¶

Imports¶

Loading and processing the UCI adult dataset¶

One-hot-encoding categories (needed for PDPs) and splitting data into train/validation¶

Training model on one-hot-encoded data¶

Permutation-based feature importance¶

Advantages (source)¶

Disadvantages (source)¶

Gain-based feature importance¶

Advantages¶

Disadvantages¶

Note¶

Partial Dependency Plots (PDP) + Individual Conditional Expectation (ICE)¶

Advantages (source, source)¶

Disadvantages (source, source)¶

Training model on NOT one-hot-encoded data and...¶

... showing feature importances as before¶

Features contributions to individual predictions: the treeinterpreter approach¶

Note¶

Local interpretable model-agnostic explanations (LIME)¶

Advantages (source)¶

Disadvantages (source)¶

SHAP (SHapley Additive exPlanations)¶

Advantages (source)¶

Disadvantages (source)¶

SHAP force plots¶

Training a CatBoost classifier¶

SHAP Feature Importance¶

SHAP Summary Plot¶

SHAP Clustering¶

SHAP Dependence Plot + SHAP Interaction Values¶

SHAP Feature Importance ¶

SHAP Summary Plot ¶

SHAP Clustering ¶

SHAP Dependence Plot + SHAP Interaction Values ¶