SlideShare a Scribd company logo
1 of 33
Download to read offline
MaMaDroid: Detecting Android
Malware by Building Markov Chains of
Behavioral Models
Android & Malware
Android market share is growing…
In 2016, 85% of smartphone sales
…and so is the interest of cybercriminals
Bypassing two-factor authentication
Stealing sensitive information, etc.
2
Current Defenses
Can’t use complex on-device operations
Limited battery and memory resources
Google’s centralized analysis
Not perfect, after-the-fact
Many apps installed outside Play Store
Lots of research in the field! However…
Permission-based models prone to false positive
Relying on API calls frequently used by malware needs
constant, costly retraining
3
Our Idea
Rely on the sequence of abstracted calls
1. Sequence captures the behavioral model
2. Abstraction provides resilience to API changes
Intuition: malware uses calls for different actions
and in different order than benign apps
E.g. android.media.MediaRecorder used by any app
with permission to record audio
Only using it after calls to getRunningTasks(), which
allows to record conversations, may suggest
maliciousness
4
Overview
5
Call Graph Extraction
Based on static analysis
Given an apk, extract call graphs
Tools
Soot (Java optimization and analysis framework)
FlowDroid (ensures contexts & flows preserved)
6
7
Call Graph
8
Overview
9
Sequence Extraction
Soot gives the sequence of functions that are
potentially called by the program, but…
Each execution could take a specific branch of the
graph and only execute a subset of the calls
When running example multiple times…
Execute() may be followed by different calls, e.g.,
getShell() only in try or getShell() + getMessage() in
catch
10
Sequence Extraction (cnt’d)
We proceed as follows…
1. Identify set of entry nodes
2. Enumerate reachable paths
3. Output set of all paths as the sequences of API calls
11
Abstraction
Packages
Using the list of 243 packages (as of API level 24) + 95
from the Google API
Packages defined by developers à “self-defined”
If we can’t tell what its class implements à “obfuscated”
Families
9 families: android, google, java, javax, xml, apache,
junit, json, dom
Plus self-defined and obfuscated
12
Example
13
Overview
14
Markov Chain
Memoryless models
Prob. transitioning from a state to another only depends on
the current state
Represented as a set of nodes
Each corresponding to a different state, and a set of edges
labeled with the probability of transition.
Sum of all probabilities associated to all edges
from any node is exactly 1
15
Markov-chain based modeling
Building the Markov Chains
From the sequence of abstracted API calls, each
package/family is a state, transition is the probability
of moving from one to another
16
Feature Extraction
For each app:
Feature vector = probabilities of transitioning from one
state to another in the Markov chain
With families, 11 possible states à 121 possible
transitions in each chain
With packages, 340 states à 115,600 transitions
Principal Component Analysis (PCA)
Standard way to reduce/refine features
17
Overview
18
Classification
Build a classifier using the extracted features
Each app labeled as benign or malware
Can use a few standard algorithms for this task…
Random Forests
1-NN, 3-NN
SVM
Maybe deep learning?
19
Datasets
20
How many API calls?
21
Android/Google family calls?
22
Evaluation
(1) Accuracy of classification on benign and malicious
samples developed around the same time
(2) Robustness to the evolution of malware as well as
of the Android framework (using older datasets for
training and newer ones for testing and vice-versa)
23
Same Year
24
family
package
Training on older samples
25
family
package
Training on newer samples
26
family
package
MaMaDroid vs DroidAPIMiner
27
Case Studies (2016/newbenign)
False Positives (164 samples)
Most of them “dangerous permissions”
E.g., SMS permissions not clear why requested
False Negatives (114 samples)
Actually not classified as malware by VirusTotal, might
actually be legitimate
Most of them adware
28
Evasion
Repackaging benign apps
Difficult to embed malicious code while keeping similar
Markov chain, viceversa is also hard
Imitating Markov chains
Likely ineffective
Obfuscation/Mangling
Still captured by the [obfuscated] abstraction
More in the paper…
29
Limitations
Classification is memory hungry
Soot is buggy, we lose ~4% of the samples
Limits of static analysis only methods
30
Future Work
Further investigate resilience to evasion
Focus on repackaged malicious apps
Injection of API calls to mess with Markov chains
Enhancements
Fine-grained abstractions (e.g., class)
Seed with dynamic analysis
31
Thank you!
32
Paper to appear at NDSS 2017:
E. Mariconti, L. Onwuzurike, P. Andriotis,
E. De Cristofaro, G. Ross, G. Stringhini.
MaMaDroid: Detecting Android Malware by
Building Markov Chains of Behavioral Model
Thank you!
33

More Related Content

Similar to MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models

TriggerScope: Towards Detecting Logic Bombs in Android Applications
TriggerScope: Towards Detecting Logic Bombs in Android ApplicationsTriggerScope: Towards Detecting Logic Bombs in Android Applications
TriggerScope: Towards Detecting Logic Bombs in Android ApplicationsPietro De Nicolao
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREIJNSA Journal
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREIJNSA Journal
 
Android malware detection through online learning
Android malware detection through online learningAndroid malware detection through online learning
Android malware detection through online learningIJARIIT
 
Return Address – The Silver Bullet
Return Address – The Silver BulletReturn Address – The Silver Bullet
Return Address – The Silver Bulletsecurityxploded
 
DEFCON 21: EDS: Exploitation Detection System WP
DEFCON 21: EDS: Exploitation Detection System WPDEFCON 21: EDS: Exploitation Detection System WP
DEFCON 21: EDS: Exploitation Detection System WPAmr Thabet
 
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...IJNSA Journal
 
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...Shakas Technologies
 
Malwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant ExtractionMalwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant ExtractionIOSR Journals
 
Cryptographic misuse
Cryptographic misuseCryptographic misuse
Cryptographic misuseAashish R
 
Icacci presentation-isi-ransomware
Icacci presentation-isi-ransomwareIcacci presentation-isi-ransomware
Icacci presentation-isi-ransomwarevinaykumar R
 
Cryptographic misuse in android applications
Cryptographic misuse in android applicationsCryptographic misuse in android applications
Cryptographic misuse in android applicationsAashish R
 
2013 Security Threat Report Presentation
2013 Security Threat Report Presentation2013 Security Threat Report Presentation
2013 Security Threat Report PresentationSophos
 
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...Keith Jones, PhD
 
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...IJCSIS Research Publications
 

Similar to MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models (20)

TriggerScope: Towards Detecting Logic Bombs in Android Applications
TriggerScope: Towards Detecting Logic Bombs in Android ApplicationsTriggerScope: Towards Detecting Logic Bombs in Android Applications
TriggerScope: Towards Detecting Logic Bombs in Android Applications
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
 
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWAREMINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
MINING PATTERNS OF SEQUENTIAL MALICIOUS APIS TO DETECT MALWARE
 
Android malware detection through online learning
Android malware detection through online learningAndroid malware detection through online learning
Android malware detection through online learning
 
Return Address – The Silver Bullet
Return Address – The Silver BulletReturn Address – The Silver Bullet
Return Address – The Silver Bullet
 
DEFCON 21: EDS: Exploitation Detection System WP
DEFCON 21: EDS: Exploitation Detection System WPDEFCON 21: EDS: Exploitation Detection System WP
DEFCON 21: EDS: Exploitation Detection System WP
 
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
MALWARE DETECTION USING MACHINE LEARNING ALGORITHMS AND REVERSE ENGINEERING O...
 
H017445260
H017445260H017445260
H017445260
 
Android security
Android security Android security
Android security
 
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
Automated Android Malware Detection Using Optimal Ensemble Learning Approach ...
 
Malwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant ExtractionMalwise-Malware Classification and Variant Extraction
Malwise-Malware Classification and Variant Extraction
 
Cryptographic misuse
Cryptographic misuseCryptographic misuse
Cryptographic misuse
 
Icacci presentation-isi-ransomware
Icacci presentation-isi-ransomwareIcacci presentation-isi-ransomware
Icacci presentation-isi-ransomware
 
proposal
proposalproposal
proposal
 
Cryptographic misuse in android applications
Cryptographic misuse in android applicationsCryptographic misuse in android applications
Cryptographic misuse in android applications
 
2013 Security Threat Report Presentation
2013 Security Threat Report Presentation2013 Security Threat Report Presentation
2013 Security Threat Report Presentation
 
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNIN...
 
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
 
Modern Malware and Threats
Modern Malware and ThreatsModern Malware and Threats
Modern Malware and Threats
 
Antimalware
AntimalwareAntimalware
Antimalware
 

More from Emiliano De Cristofaro

The Genomics Revolution: The Good, The Bad, and The Ugly (Confessions of a Pr...
The Genomics Revolution: The Good, The Bad, and The Ugly (Confessions of a Pr...The Genomics Revolution: The Good, The Bad, and The Ugly (Confessions of a Pr...
The Genomics Revolution: The Good, The Bad, and The Ugly (Confessions of a Pr...Emiliano De Cristofaro
 
Building and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsBuilding and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsEmiliano De Cristofaro
 
A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...
A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...
A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...Emiliano De Cristofaro
 
The Genomics Revolution: The Good, The Bad, The Ugly
The Genomics Revolution: The Good, The Bad, The UglyThe Genomics Revolution: The Good, The Bad, The Ugly
The Genomics Revolution: The Good, The Bad, The UglyEmiliano De Cristofaro
 
Understanding, Characterizing, and Detecting Facebook Like Farms
Understanding, Characterizing, and Detecting Facebook Like FarmsUnderstanding, Characterizing, and Detecting Facebook Like Farms
Understanding, Characterizing, and Detecting Facebook Like FarmsEmiliano De Cristofaro
 
The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)
The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)
The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)Emiliano De Cristofaro
 
Privacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and ApplicationsPrivacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and ApplicationsEmiliano De Cristofaro
 
The Genomics Revolution: The Good, The Bad, and The Ugly
The Genomics Revolution: The Good, The Bad, and The UglyThe Genomics Revolution: The Good, The Bad, and The Ugly
The Genomics Revolution: The Good, The Bad, and The UglyEmiliano De Cristofaro
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingEmiliano De Cristofaro
 

More from Emiliano De Cristofaro (9)

The Genomics Revolution: The Good, The Bad, and The Ugly (Confessions of a Pr...
The Genomics Revolution: The Good, The Bad, and The Ugly (Confessions of a Pr...The Genomics Revolution: The Good, The Bad, and The Ugly (Confessions of a Pr...
The Genomics Revolution: The Good, The Bad, and The Ugly (Confessions of a Pr...
 
Building and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsBuilding and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility Analytics
 
A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...
A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...
A Measurement Study of 4chan’s Politically Incorrect Forum and Its Effects on...
 
The Genomics Revolution: The Good, The Bad, The Ugly
The Genomics Revolution: The Good, The Bad, The UglyThe Genomics Revolution: The Good, The Bad, The Ugly
The Genomics Revolution: The Good, The Bad, The Ugly
 
Understanding, Characterizing, and Detecting Facebook Like Farms
Understanding, Characterizing, and Detecting Facebook Like FarmsUnderstanding, Characterizing, and Detecting Facebook Like Farms
Understanding, Characterizing, and Detecting Facebook Like Farms
 
The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)
The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)
The Genomics Revolution: The Good, The Bad, and The Ugly (UEOP16 Keynote)
 
Privacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and ApplicationsPrivacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and Applications
 
The Genomics Revolution: The Good, The Bad, and The Ugly
The Genomics Revolution: The Good, The Bad, and The UglyThe Genomics Revolution: The Good, The Bad, and The Ugly
The Genomics Revolution: The Good, The Bad, and The Ugly
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome Sequencing
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models

  • 1. MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models
  • 2. Android & Malware Android market share is growing… In 2016, 85% of smartphone sales …and so is the interest of cybercriminals Bypassing two-factor authentication Stealing sensitive information, etc. 2
  • 3. Current Defenses Can’t use complex on-device operations Limited battery and memory resources Google’s centralized analysis Not perfect, after-the-fact Many apps installed outside Play Store Lots of research in the field! However… Permission-based models prone to false positive Relying on API calls frequently used by malware needs constant, costly retraining 3
  • 4. Our Idea Rely on the sequence of abstracted calls 1. Sequence captures the behavioral model 2. Abstraction provides resilience to API changes Intuition: malware uses calls for different actions and in different order than benign apps E.g. android.media.MediaRecorder used by any app with permission to record audio Only using it after calls to getRunningTasks(), which allows to record conversations, may suggest maliciousness 4
  • 6. Call Graph Extraction Based on static analysis Given an apk, extract call graphs Tools Soot (Java optimization and analysis framework) FlowDroid (ensures contexts & flows preserved) 6
  • 7. 7
  • 10. Sequence Extraction Soot gives the sequence of functions that are potentially called by the program, but… Each execution could take a specific branch of the graph and only execute a subset of the calls When running example multiple times… Execute() may be followed by different calls, e.g., getShell() only in try or getShell() + getMessage() in catch 10
  • 11. Sequence Extraction (cnt’d) We proceed as follows… 1. Identify set of entry nodes 2. Enumerate reachable paths 3. Output set of all paths as the sequences of API calls 11
  • 12. Abstraction Packages Using the list of 243 packages (as of API level 24) + 95 from the Google API Packages defined by developers à “self-defined” If we can’t tell what its class implements à “obfuscated” Families 9 families: android, google, java, javax, xml, apache, junit, json, dom Plus self-defined and obfuscated 12
  • 15. Markov Chain Memoryless models Prob. transitioning from a state to another only depends on the current state Represented as a set of nodes Each corresponding to a different state, and a set of edges labeled with the probability of transition. Sum of all probabilities associated to all edges from any node is exactly 1 15
  • 16. Markov-chain based modeling Building the Markov Chains From the sequence of abstracted API calls, each package/family is a state, transition is the probability of moving from one to another 16
  • 17. Feature Extraction For each app: Feature vector = probabilities of transitioning from one state to another in the Markov chain With families, 11 possible states à 121 possible transitions in each chain With packages, 340 states à 115,600 transitions Principal Component Analysis (PCA) Standard way to reduce/refine features 17
  • 19. Classification Build a classifier using the extracted features Each app labeled as benign or malware Can use a few standard algorithms for this task… Random Forests 1-NN, 3-NN SVM Maybe deep learning? 19
  • 21. How many API calls? 21
  • 23. Evaluation (1) Accuracy of classification on benign and malicious samples developed around the same time (2) Robustness to the evolution of malware as well as of the Android framework (using older datasets for training and newer ones for testing and vice-versa) 23
  • 25. Training on older samples 25 family package
  • 26. Training on newer samples 26 family package
  • 28. Case Studies (2016/newbenign) False Positives (164 samples) Most of them “dangerous permissions” E.g., SMS permissions not clear why requested False Negatives (114 samples) Actually not classified as malware by VirusTotal, might actually be legitimate Most of them adware 28
  • 29. Evasion Repackaging benign apps Difficult to embed malicious code while keeping similar Markov chain, viceversa is also hard Imitating Markov chains Likely ineffective Obfuscation/Mangling Still captured by the [obfuscated] abstraction More in the paper… 29
  • 30. Limitations Classification is memory hungry Soot is buggy, we lose ~4% of the samples Limits of static analysis only methods 30
  • 31. Future Work Further investigate resilience to evasion Focus on repackaged malicious apps Injection of API calls to mess with Markov chains Enhancements Fine-grained abstractions (e.g., class) Seed with dynamic analysis 31
  • 32. Thank you! 32 Paper to appear at NDSS 2017: E. Mariconti, L. Onwuzurike, P. Andriotis, E. De Cristofaro, G. Ross, G. Stringhini. MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Model