Don’t be wrong because you might
be fooled: Tips on how to
secure your ML model
Figuring out the reasons why your ML model might be consistently less accurate in certain classes than others, might help you increase not only its total accuracy but also its adversarial robustness.
“Cat”, “bird” and “dog” classes are harder to correctly classify and easier to attack
- An untargeted attack, where an attack is considered successful when the predicted class label is changed (to any other label.
- A targeted attack with the least-likely target, where we have a successful attack when the predicted class label is changed specifically to the label that the model has the least confidence for the specific instance.
Root cause analysis
1. Miss-labeled/Confusing Training Data
2. Is this a “cat” or a “dog”?
3. Is this a “bird” or an “airplane”?
- Good data means a good model: spend some time probing your data and try to detect if there are any systematic errors in your training set.
- Use explanation methods as a debugger, in order to understand why your model model misses certain groups of instances more than others
- Apply adversarial robustness attacks to test the vulnerability of your model