5. http://pralab.diee.unica.it
New Challenges for Machine Learning
• The use of machine learning opens up new big possibilities
but also new security risks
• Proliferation and sophistication
of attacks and cyberthreats
– Skilled / economically-motivated
attackers (e.g., ransomware)
• Several security systems use machine learning to detect attacks
– but … is machine learning secure enough?
5
7. http://pralab.diee.unica.it
Is Machine Learning Secure Enough?
• Problem: how to evade a linear (trained) classifier?
Start 2007
with a bang!
Make WBFS
YOUR
PORTFOLIO’s
first winner
of the year
...
start
bang
portfolio
winner
year
...
university
campus
1
1
1
1
1
...
0
0
+6 > 0, SPAM
(correctly classified)
f (x) = sign(wT
x)
x
start
bang
portfolio
winner
year
...
university
campus
+2
+1
+1
+1
+1
...
-3
-4
w
x’
St4rt 2007
with a b4ng!
Make WBFS
YOUR
PORTFOLIO’s
first winner
of the year
... campus
start
bang
portfolio
winner
year
...
university
campus
0
0
1
1
1
...
0
1
+3 -4 < 0, HAM
(misclassified email)
f (x) = sign(wT
x)
7
8. http://pralab.diee.unica.it
Evasion of Linear Classifiers
• Formalized as an optimization problem
– Goal: to minimize the discriminant function
• i.e., to be classified as legitimate with the maximum confidence
– Constraints on input data manipulation
• e.g., number of words to be modified in each spam email
8
min$% 𝑤(
𝑥′
𝑠. 𝑡. 𝑑(𝑥, 𝑥%
) ≤ 𝑑34$
9. http://pralab.diee.unica.it
Dense and Sparse Evasion Attacks
• L2-norm noise corresponds to
dense evasion attacks
– All features are modified by
a small amount
• L1-norm noise corresponds to
sparse evasion attacks
– Few features are significantly
modified
9
min$% 𝑤(
𝑥′
𝑠. 𝑡. |𝑥 − 𝑥%
|7
7
≤ 𝑑34$
min$% 𝑤(
𝑥%
𝑠. 𝑡. |𝑥 − 𝑥%
|8 ≤ 𝑑34$
12. http://pralab.diee.unica.it
• SVM learning is equivalent to a robust optimization problem
Robustness and Regularization
[Xu et al., JMLR 2009]
12
min
w,b
1
2
wT
w+C max 0,1− yi f (xi )( )
i
∑ min
w,b
max
ui∈U
max 0,1− yi f (xi +ui )( )
i
∑
1/margin classification error on
training data (hinge loss) bounded perturbation!
13. http://pralab.diee.unica.it
Generalizing to Other Norms
• Optimal regularizer should use dual norm of noise uncertainty sets
13
l2-norm regularization is
optimal against l2-norm noise!
Infinity-norm regularization is
optimal against l1-norm noise!
min
w,b
1
2
wT
w+C max 0,1− yi f (xi )( )
i
∑ min
w,b
w ∞
+C max 0,1− yi f (xi )( )
i
∑ , w ∞
= max
i=1,...,d
wi
14. http://pralab.diee.unica.it
Interesting Fact
• Infinity-norm SVM is more secure against L1 attacks as it bounds
the maximum absolute value of the feature weights
• This explains the heuristic intuition of using more uniform feature
weights in previous work [Kolcz and Teo, 2009; Biggio et al., 2010]
14
weights
weights
16. http://pralab.diee.unica.it
Security vs Sparsity
• Problem: SVM and Infinity-norm SVM provide dense solutions!
• Trade-off between security (to l2 or l1 attacks) and sparsity
– Sparsity reduces computational complexity at test time!
16
weights
weights
17. http://pralab.diee.unica.it
Elastic-Net Regularization
[H. Zou & T. Hastie, 2005]
• Originally proposed for feature selection
– to group correlated features together
• Trade-off between sparsity and security against l2-norm attacks
17
𝑤 9:;9< = 1 − 𝜆 𝑤 8 +
𝜆
2
𝑤 7
7
elastic net l1 l2
20. http://pralab.diee.unica.it
Linear Classifiers
• SVM
– quadratic prog.
• Infinity-norm SVM
– linear prog.
• 1-norm SVM
– linear prog.
• Elastic-net SVM
– quadratic prog.
• Octagonal SVM
– linear prog.
20
min
G,H
1
2
𝑤 7
7
+ 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O
;
OQ8
min
G,H
𝑤 F + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O
;
OQ8
min
G,H
𝑤 8 + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O
;
OQ8
min
G,H
1 − 𝜆 𝑤 8 +
𝜆
2
𝑤 7
7
+ 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O
;
OQ8
min
G,H
1 − 𝜌 𝑤 8 + 𝜌 𝑤 F + 𝐶 J max 0,1 − 𝑦O 𝑓 𝑥O
;
OQ8
𝑓 𝑥 = 𝑤( 𝑥 + 𝑏
21. http://pralab.diee.unica.it
Security and Sparsity Measures
• Sparsity
– Fraction of weights equal to zero
• Security (Weight Evenness)
– E=1/d if only one weight is different from zero
– E=1 if all weights are equal in absolute value
• Parameter selection with 5-fold cross-validation optimizing:
AUC + 0.1 S + 0.1 E
21
𝑆 =
1
𝑑
𝑤T|𝑤T = 0, 𝑘 = 1, … , 𝑑
𝐸 =
1
𝑑
𝑤 8
𝑤 F
∈ [
1
𝑑
, 1]
22. http://pralab.diee.unica.it
Results on Spam Filtering
Sparse Evasion Attack
• 5000 samples from TREC 07 (spam/ham emails)
• 200 features (words) selected to maximize information gain
• Results averaged on 5 repetitions, using 500 TR/TS samples
• (S,E) measures reported in the legend (in %)
22
0 10 20 30 40
0
0.2
0.4
0.6
0.8
1
Spam Filtering
AUC10%
d max
SVM (0, 37)
∞−norm (4, 96)
1−norm (86, 4)
el−net (67, 6)
8gon (12, 88)
maximum number of words modified in each spam
23. http://pralab.diee.unica.it
Results on PDF Malware Detection
Sparse Evasion Attack
• PDF: hierarchy of interconnected objects (keyword/value pairs)
23
0 20 40 60 80
0
0.2
0.4
0.6
0.8
1
PDF Malware DetectionAUC10%
d max
SVM (0, 47)
∞−norm (0, 100)
1−norm (91, 2)
el−net (55, 13)
8gon (69, 29)
maximum number of keywords added in each malicious PDF file
/Type 2
/Page 1
/Encoding 1
…
13 0 obj
<< /Kids [ 1 0 R 11 0 R ]
/Type /Page
... >> end obj
17 0 obj
<< /Type /Encoding ...>>
endobj
Features: keyword count
11,500 samples
5 reps - 500 TR/TS samples
114 features (keywords)
selected with information gain
24. http://pralab.diee.unica.it
Conclusions and Future Work
• We have shed light on the theoretical and practical implications
of sparsity and security in linear classifiers
• We have defined a novel regularizer to tune the trade-off
between sparsity and security against sparse evasion attacks
• Future work
– To investigate a similar trade-off for
• poisoning (training) attacks
• nonlinear classifiers
24