ABSTRACT :
--------------------
Modern malware that are metamorphic or polymorphic in nature mutate their code by employing code obfuscation and encryption methods to thwart detection. Thus, conventional signature based scanners fail to detect these malware. In order to address the problems of detecting known variants of metamorphic malware, we propose a method using bioinformatics techniques effectively used for Protein and DNA matching. Instead of using exact signature matching methods, more sophisticated signature(s) are extracted using multiple sequence alignment (MSA). The results show that the proposed method is capable of identifying malware variants with minimum false alarms and misses. Also, the detection rate achieved with our proposed method is better compared to commercial antivirus products used in the study.
Status:
----------
This work has been accepted by 8th IEEE International Conference on Innovations in Information Technology (Innovations'12).
Link:
-------
http://ieeexplore.ieee.org/xpl/login.jsp?reload=true&tp=&arnumber=6207739&url=http://ieeexplore.ieee.org/iel5/6203543/6207707/06207739.pdf?arnumber=6207739
e-mail: grijesh.mnit@gmail.com
Transaction Management in Database Management System
Â
Metamorphic Malware Analysis and Detection
1. Bioinformatics Techniques for
Metamorphic Malware Analysis
and Detection
Malaviya National Institute of Technology, Jaipur
and Detection
Supervisors:
Dr. M. S. Gaur
Dr. V. Laxmi
By:
Grijesh Chauhan
(2009PCP116)
2. Outline
Malware & Metamorphic malware
Motivation
Objective
Bioinformatics TechniquesBioinformatics Techniques
MOMENTUM
Dataset
Result & Analysis
References
Malaviya National Institute of Technology, Jaipur
3. Malware
Malware are software with intentions to infect and
replicate.
Threats
Loss of data
Malaviya National Institute of Technology, Jaipur
Loss of data
Degrades computer system performance
Identity threat
Two broad categories
Metamorphic: Virus body changes on each replication
Polymorphic: Encrypts malicious payload to avoid
detection
4. Metamorphic Malware[1/2]
Metamorphic malware have similar
functionality, different structure and signature.
Malaviya National Institute of Technology, Jaipur
Similar to genetic diversity in Biology.
Variant -1 Variant -2 Variant -3
Metamorphic Engine
Diagram depicts metamorphic malware variants with reordered code
5. Metamorphic Malware[1/2]
Metamorphic Malware automatically re-codes itself
each time it propagates or is distributed.
Conventional signature based scanners are
ineffective for detecting variants of same malware.
Malaviya National Institute of Technology, Jaipur
Sophisticated signature(s) are required to detect
metamorphic variants of malware.
6. Motivation
Variants of metamorphic malware are generated
using a small embedded metamorphic engine to
defeat detection [2].
Limited number of instructions are used to generate
Malaviya National Institute of Technology, Jaipur
variants so as to preserve functionality.
Metamorphic malware like DNA/ protein sequences
mutate from generation to generation, they inherit
functionality and some structural similarity with
ancestral malware.
7. Objective
To devise a method for detection of metamorphic
malware and its variants.
To extract the abstract signature(s) using
Bioinformatics sequence alignment
Malaviya National Institute of Technology, Jaipur
base code is preserved in different generations, obfuscated
using junk code or equivalent instructions etc.
To identify unseen malware samples using best
representative signatures (group/single) of a family.
8. Sequence Alignment [1/2]
Sequence alignment is a way of arranging
DNA/Protein sequences to identify regions of
similarity to infer functional, structural or
evolutionary relationship.
Malaviya National Institute of Technology, Jaipur
Alignment Methods
Global Alignment - align sequences end to end.
Local Alignment - align substring of one sequence with
substring of other.
Multiple Sequence Alignment (MSA) - align more than
two sequences.
9. Sequence Alignment [2/2]
Global alignment
L G P S S K Q T G K G S - S R I D N
L N - I T K S A G K G A I M R L D A
Local alignment
Malaviya National Institute of Technology, Jaipur
Local alignment
- - - - - - T G - G - - - - - - -
- - - - - - A G K G - - - - - - -
Alignment Parameter
Match
Mismatch
Gap
Point of Mutation
10. Multiple Sequence Alignment
MSA is extension of pairwise alignment for more
than two sequences.
It is used to identify conserved regions across a
group of sequences.
Malaviya National Institute of Technology, Jaipur
M1 M2 M3 M4 M5
add add add - add
- push push push push
Mov mov mov mov mov
- call jmp jz jmp
jmp jmp mov mov mov
• Mi – ith Malware instance
11. Implementation of MSA
MSA is implemented using Progressive technique
(ClustalW[9])
Progressive MSA follows three steps:
Determine similarity between each pair by pairwise
Malaviya National Institute of Technology, Jaipur
Determine similarity between each pair by pairwise
alignment.
Construct a guided tree (Phylogenetic tree) to represent
evolutionary relationship.
MSA is build by aligning closely related groups to most
distant group according to guided tree.
12. Phylogenetic Tree
Phylogenetic Tree depict evolutionary relationship
among the sequences.
To form groups of similar
viruses
Malaviya National Institute of Technology, Jaipur
viruses
Guides MSA progressively
to align closer groups first
A B D F
E
( (E,(A,B)), (D,F) )
13. Similarity Measurement
Alignment Score : Is the sum of score specified
for each aligned pair of mnemonics. Higher the
score more similar the sequences.
Distance (d) : Calculated using following
formulas
Malaviya National Institute of Technology, Jaipur
formulas
Higher the distance more dissimilar the sequences
)#(#
#
matchmismatch
mismatch
Nd
+
=
)##(# gapmatchmismatchLd ++=
• Nd is Normalized distance, Ld is Levenshtein distance
14. Identification of Base Malware
Base malware in a family is most similar to rest all
with highest sum of score using pairwise alignment
(SoP[3]).
M1 M2 M3 M4 SoPM2
Malaviya National Institute of Technology, Jaipur
M1 - 7 -2 1 6
M2 7 - -3 0 4
M3 -2 -3 - 1 -4
M4 1 0 1 - 2
is Base Malware Score Matrix
M1
M3
M4
M2
M1
• Mi – ith Malware instance
15. Implementation Method
MetamOrphic Malware ExploratioN Technique
Using MSA (MOMENTUM) demonstrate the
applicability of Bioinformatics Techniques for
metamorphic malware analysis and detection.
Malaviya National Institute of Technology, Jaipur
Two phase of MOMENTUN are:
Analysis of Metamorphism in Tools/Real Malware
Signature Modelling and Testing
16. MOMENTUM [1/2]
Metamorphic Families
(Virus Tools and Real Malware)
Intra-Family pair-wise Alignment
Malaviya National Institute of Technology, Jaipur
Distance Matrix Base file Alignments of two
files
Metamorphic?
Inter-Family pair-wise
Alignment
Families
Overlap ?
Obfuscation ?
• Flow diagram for metamorphism analysis
17. MOMENTUM [2/2]
Training Set Testing Set
Divide data set in two parts
Malaviya National Institute of Technology, Jaipur
Extract Group
Signature
Testing with single and group signatures
Single Signature
Scan Logs
Threshold Threshold
• Diagram depicts Signature Modelling and Testing
18. MSA Signature
MSA signature (single signature) is a sequence of
preserved mnemonics in alignment.
M1 M2 M3 M4 M5 MSA Sign
push push - - push push
Mt
push
Malaviya National Institute of Technology, Jaipur
Mnemonic that appears more than 50% in a row
is included in MSA signature.
- - jump jump jump jump
mov mov - lea xor
call call call call call call
push mov mov - mov mov
• Mi – ith Malware instance and Mt – Test Sample
jump
lea
call
push
19. Group Signature
Group signature is extracted from single signature
for each subgroup.
Sub groups are formed using evolutionary relationship.
Single signature is extracted for each subgroup and
combined in the form of wildcard.
Malaviya National Institute of Technology, Jaipur
combined in the form of wildcard.
DiagramSign1 Sign2 Sign3 Sign4 Sign5 Group Sign
push push - - push push
jz jz jump jump jump jump|jz
mov mov - lea xor mov|lea|xor
call call call call call call
- mov mov - push mov|push
• Signi – Signature for ith sub-group in a family
Mt
push
jz
lea
call
push
20. Threshold
Sign
0 B B M M Score
. . . . . .
Benign Malware
Malaviya National Institute of Technology, Jaipur
Threshold
0 Bmin Bmax Mmin Mmax
Score
Where:
Bmin Benign with minimum score
Bmax Benign with maximum score
Mmin Malware with minimum score
Mmax Malware with maximum score
Threshold (Bmax + Mmin) /2 , ( Threshold > Bmax )
21. Dataset [1/2]
Dataset Description:
Type Source #Family #instances
Synthetic
NGVCK, PSMPC, G2,
MPCGEN
46 1051
User Agencies
Malaviya National Institute of Technology, Jaipur
* consists of unknown viruses (in test set).
Dataset is equally divided into training and
testing set.
Real
User Agencies
52 + 1* 1209
VxHeavens
Benign System32,Cygwin etc. 1 150
1*
22. Dataset [2/2]
All samples are in Portable Executables (PE)
format.
Samples are unpacked using
Dynamic unpacker (EtherUnpack [7] )
Malaviya National Institute of Technology, Jaipur
Dynamic unpacker (EtherUnpack [7] )
Signature based unpacker (GUNPacker [10])
Malware families are created from combined
scanned results of 14 antiviruses.
Benign samples are also scanned.
23. Result for Intra Family
0.05
0.1
0.15
0.2
0.25
0.3
AverageDistance
Global
Local
Levenshtein
Malaviya National Institute of Technology, Jaipur
Non zero values indicates presence of metamorphism in
synthetic data.
Levenshtein distance is high due to junk code insertion.
Inspite of high values of global distance, local distances are
low in most of the samples. This indicates presence of similar
regions in code.
0
NGVCK PSMPC G2 MPCGEN
• Average distance is between 0 to 1
24. Result for Inter Family
0.1
0.2
0.3
0.4
0.5
0.6
0.7
AverageDistnce
Global
Local
Levenshtein
Malaviya National Institute of Technology, Jaipur
Distance is less than intra family distance. This indicates
most of malware share some base code.
Levenshtein distance is higher because of change in
functionality.
0
0.1
NGVCK PSMPC G2 MPCGEN VX HEAVENS
• Average distance is between 0 to 1
25. Comparative Analysis
VIRUS TYPE
Replacements/
Alignment
Avg. SoD OBFUSCATION
NGVCK 47 1.03 Average Simple
G2 3 1.45 Low Simple
MPCGEN 31 0.61 Average Simple
Malaviya National Institute of Technology, Jaipur
MPCGEN 31 0.61 Average Simple
PSMPC 1 1.35 Low Weak
Vx-Heavens 122 8.3 Large Complex
Viruses generated using tools belong to same family.
Families of real malware are distinct.
In PSMPC loop and jump instructions contribute for
obfuscation this increases the distance between samples.
NGVCK viruses overlaps with real malware (Savior).
• SoD – Sum of distances of a family with rest other family
26. Detection Results
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
EvaluationMetrics
MSA Single
Group Signature
Malaviya National Institute of Technology, Jaipur
95.5% of malware is detected with MSA signature, detection
with Group signature is 72.4% .
53% of benign is falsely detected as malware with MSA
signature due to loss mnemonics used for mutation in
malware.
Group signature preserves point of mutation that is absent in
benign samples.
0
0.1
TPR FPR
28. Scope for Improvement
Instead of same mismatch score, compute
weighted score for each pair of mnemonics using
frequency of mismatches.
In the alignment, operand part can be considered
to verify actual changes (replacement/gap).
Malaviya National Institute of Technology, Jaipur
to verify actual changes (replacement/gap).
This can fetch the way morpher preserves
functionality.
29. List of Publications
[1] Vinod P., V.Laxmi, M.S.Gaur, Grijesh Chauhan
Detecting Malicious Files using Non-Signature based Methods,
(To appear) Oxford Computer Journal.
[2] Vinod P., V.Laxmi, M.S.Gaur, Grijesh Chauhan
Malware Detection using Non-Signature based Method, In
Malaviya National Institute of Technology, Jaipur
Malware Detection using Non-Signature based Method, In
Proceeding of IEEE International Conference on Network
Communication and Computer-ICNCC 2011, pp-427-43, DOI:
978-1-4244-9551-1/11.
30. References
[1] E.Karim, A.Walenstein, A.Lakhotia, “Malware Phylogeny using Permutation
of code”, In Proceedings of EICAR 2005, pp 167-174
[2] M.R. Chouchane and A. Lakhotia , “Using engine signature to detect
metamorphic malware”, In Proceedings of the 4th ACM workshop on
Recurring malcode, WORM '06, 2006,73-78.
Malaviya National Institute of Technology, Jaipur
[3] Mona Singh, " Multiple Sequence Alignment ", Lecture Notes:
www.cs.princeton.edu/~mona/Lecture/msa1.pdf (Last viewed on 14-6-2011)
[4] Mona Singh, " Phylogenetics ", Lecture Notes:
www.cs.princeton.edu/~mona/Lecture/msa1.pdf (Last viewed on 14-6-2011)
[5] T. Smith and M. Waterman, “Identification of Common Molecular
Subsequences”, Journal of Molecular Biology, pp 195-197, 1987
[6] Mark Stamp, Wing Wong. "Hunting for metamorphic engines". Journal in
Computer Virology, 2(3):211-229
31. References
[7] Ether for Malware Unpacking: http://ether.gtisc.gatech.edu/malware.html
(Last viewed on 14-6-2011)
[8] Jian Li, Jun Xu, Ming Xu, HengiLi Zhao, Ning Zheng, “Malware
Obfuscation Measuring via Evolutionary Similarity”, In Proceedings of IEEE
Int. Conference on Future Information Network 2009.
Malaviya National Institute of Technology, Jaipur
[9] Larkin MA et al, " Clustal W and Clustal X version 2.0 ".
Bioinformatics, 23, 2947-2948, 2007.
[10] GUnPacker :
http://www.woodmann.com/collaborative/tools/index.php/GUnPacker
(Last viewed on 14-6-2011)