Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Pa#ern	
  Recogni-on	
  	
  
and	
  Applica-ons	
  Lab	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
 ...
 
http://pralab.diee.unica.it
Motivation
•  Increasing number of services and apps available on the Internet
–  Improved u...
 
http://pralab.diee.unica.it
Is Feature Selection Secure?
•  Adversarial ML: security of learning and clustering algorith...
 
http://pralab.diee.unica.it
Feature Selection under Attack
Attacker Model
•  Goal of the attack
•  Knowledge of the atta...
 
http://pralab.diee.unica.it
Attacker’s Goal
•  Integrity Violation: to perform malicious activities without
compromising...
 
http://pralab.diee.unica.it
Attacker’s Knowledge
•  Perfect knowledge
–  upper bound on performance degradation under at...
 
http://pralab.diee.unica.it
•  Inject points into the training data
•  Constraints on data manipulation
–  Fraction of t...
 
http://pralab.diee.unica.it
Attack Scenarios
•  Different potential attack scenarios depending on assumptions
on the att...
 
http://pralab.diee.unica.it
Embedded Feature Selection Algorithms
•  Linear models
–  Select features according to |w|
9...
 
http://pralab.diee.unica.it
Poisoning Embedded Feature Selection
•  Attacker’s objective
–  to maximize generalization e...
 
http://pralab.diee.unica.it
KKTconditions
Gradient Computation
11	
  
How does the solution change w.r.t. xc?
Subgradien...
 
http://pralab.diee.unica.it
Gradient Computation
•  We require the KKT conditions to hold under perturbation of xc
12	
 ...
 
http://pralab.diee.unica.it
Poisoning Attack Algorithm
13	
  
 
http://pralab.diee.unica.it
Experiments on PDF Malware Detection
•  PDF: hierarchy of interconnected objects (keyword/va...
 
http://pralab.diee.unica.it
Experimental Results
15	
  
PerfectKnowledge
Data: 300 (TR) and 5,000 (TS) samples – 114 fea...
 
http://pralab.diee.unica.it
Experimental Results
16	
  
PerfectKnowledge
A: selected features in the absence of attack
B...
 
http://pralab.diee.unica.it
Conclusions and Future Work
•  Framework for security evaluation of feature selection under ...
 
http://pralab.diee.unica.it
?	
  Any questions
Thanks	
  for	
  your	
  a#en-on!	
  
18	
  
 
http://pralab.diee.unica.it
Experimental Results
19	
  
Perfect	
  Knowledge	
  Limited	
  Knowledge	
  
Upcoming SlideShare
Loading in …5
×

Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training Data Poisoning?"

2,882 views

Published on

Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.

Published in: Education
  • Login to see the comments

Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training Data Poisoning?"

  1. 1. Pa#ern  Recogni-on     and  Applica-ons  Lab                                     University   of  Cagliari,  Italy     Department  of   Electrical  and  Electronic   Engineering   Is Feature Selection Secure against Training Data Poisoning? Huang  Xiao2,  BaEsta  Biggio1,  Gavin  Brown3,  Giorgio  Fumera1,   Claudia  Eckert2,  Fabio  Roli1     (1)  Dept.  Of  Electrical  and  Electronic  Engineering,  University  of  Cagliari,  Italy   (2)    Department  of  Computer  Science,  Technische  Universität  München,  Germany     (3)  School  of  Computer  Science,  University  of  Manchester,  UK     Jul  6  -­‐  11,  2015  ICML  2015  
  2. 2.   http://pralab.diee.unica.it Motivation •  Increasing number of services and apps available on the Internet –  Improved user experience •  Proliferation and sophistication of attacks and cyberthreats –  Skilled / economically-motivated attackers •  Several security systems use machine learning to detect attacks –  but … is machine learning secure enough? 2  
  3. 3.   http://pralab.diee.unica.it Is Feature Selection Secure? •  Adversarial ML: security of learning and clustering algorithms –  Barreno et al., 2006; Huang et al., 2011; Biggio et al., 2014; 2012; 2013a; Brueckner et al., 2012; Globerson & Roweis, 2006 •  Feature Selection –  High-dimensional feature spaces (e.g., spam and malware detection) –  Dimensionality reduction to improve interpretability and generalization •  How about the security of feature selection? 3   x1 x2 ... … … xd x(1) x(2) … x(k)
  4. 4.   http://pralab.diee.unica.it Feature Selection under Attack Attacker Model •  Goal of the attack •  Knowledge of the attacked system •  Capability of manipulating data •  Attack strategy 4   PD(X,Y)?   f(x)
  5. 5.   http://pralab.diee.unica.it Attacker’s Goal •  Integrity Violation: to perform malicious activities without compromising normal system operation –  enforcing selection of features to facilitate evasion at test time •  Availability Violation: to compromise normal system operation –  enforcing selection of features to maximize generalization error •  Privacy Violation: gaining confidential information on system users –  reverse-engineering feature selection to get confidential information 5   Security Violation Integrity Availability Privacy
  6. 6.   http://pralab.diee.unica.it Attacker’s Knowledge •  Perfect knowledge –  upper bound on performance degradation under attack •  Limited knowledge –  attack on surrogate data sampled from same distribution TRAINING DATA FEATURE REPRESENTATION FEATURE SELECTION ALGORITHM x1 x2 ... … … xd 6   x(1) x(2) … x(k)
  7. 7.   http://pralab.diee.unica.it •  Inject points into the training data •  Constraints on data manipulation –  Fraction of the training data under the attacker’s control –  Application-specific constraints •  Example on PDF data –  PDF file: hierarchy of interconnected objects –  Objects can be added but not easily removed without compromising the file structure Attacker’s Capability 7   13  0  obj   <<  /Kids  [  1  0  R  11  0  R  ]   /Type  /Page   ...  >>  end  obj   17  0  obj   <<  /Type  /Encoding   /Differences  [  0  /C0032  ]  >>   endobj  
  8. 8.   http://pralab.diee.unica.it Attack Scenarios •  Different potential attack scenarios depending on assumptions on the attacker’s goal, knowledge, capability –  Details and examples in the paper •  Poisoning Availability Attacks Enforcing selection of features to maximize generalization error –  Goal: availability violation –  Knowledge: perfect / limited –  Capability: injecting samples into the training data 8  
  9. 9.   http://pralab.diee.unica.it Embedded Feature Selection Algorithms •  Linear models –  Select features according to |w| 9   LASSO     Tibshirani, 1996 Ridge  Regression   Hoerl & Kennard, 1970 Elas9c  Net   Zou & Hastie, 2005
  10. 10.   http://pralab.diee.unica.it Poisoning Embedded Feature Selection •  Attacker’s objective –  to maximize generalization error on untainted data •  Solution: subgradient-ascent technique 10   Loss estimated on surrogate data (excluding the attack point) Algorithm is trained on surrogate data (including the attack point) … w.r.t. choice of the attack point
  11. 11.   http://pralab.diee.unica.it KKTconditions Gradient Computation 11   How does the solution change w.r.t. xc? Subgradient is unique at the optimal solution!
  12. 12.   http://pralab.diee.unica.it Gradient Computation •  We require the KKT conditions to hold under perturbation of xc 12   Gradient is now uniquely determined
  13. 13.   http://pralab.diee.unica.it Poisoning Attack Algorithm 13  
  14. 14.   http://pralab.diee.unica.it Experiments on PDF Malware Detection •  PDF: hierarchy of interconnected objects (keyword/value pairs) •  Learner’s task: to classify benign vs malware PDF files •  Attacker’s task: to maximize classification error by injecting poisoning attack samples –  Only feature increments are considered (object insertion) •  Object removal may compromise the PDF file /Type    2   /Page    1   /Encoding  1   …   13  0  obj   <<  /Kids  [  1  0  R  11  0  R  ]   /Type  /Page   ...  >>  end  obj     17  0  obj   <<  /Type  /Encoding   /Differences  [  0  /C0032  ]  >>   endobj   Features:  keyword  counts   14   Maiorca et al., 2012; 2013; Smutz & Stavrou, 2012; Srndic & Laskov, 2013
  15. 15.   http://pralab.diee.unica.it Experimental Results 15   PerfectKnowledge Data: 300 (TR) and 5,000 (TS) samples – 114 features Similar results obtained for limited-knowledge attacks!
  16. 16.   http://pralab.diee.unica.it Experimental Results 16   PerfectKnowledge A: selected features in the absence of attack B: selected features under attack k: number of features selected out of d r: common features between the two setsKuncheva et al., 2007
  17. 17.   http://pralab.diee.unica.it Conclusions and Future Work •  Framework for security evaluation of feature selection under attack –  Poisoning attacks against embedded feature selection algorithms •  Poisoning can significantly affect feature selection –  LASSO significantly vulnerable to poisoning attacks •  Future research directions –  Error bounds on the impact of poisoning on learning algorithms –  Secure / robust feature selection algorithms 17   L1 regularization: stability against random noise, but not against adversarial (worst-case) noise?
  18. 18.   http://pralab.diee.unica.it ?  Any questions Thanks  for  your  a#en-on!   18  
  19. 19.   http://pralab.diee.unica.it Experimental Results 19   Perfect  Knowledge  Limited  Knowledge  

×