Analyzing malware and correlating huge databases of samples is a job for few. Big AV companies have their own systems for cataloging and analyzing malware and our goal is to bring that power to the masses through our OpenSource malware analysis pipeline system called Aleph <https: />.
Aleph is not restricted to malware since it is artifact-oriented. It was built with no specific file-type in mind but with the possibility to work with any filetype and have plugins to extract information and correlate with other artifacts for further analysis. This makes aleph also very useful in forensics and other types of work.
Aleph is a multi-compartmentalized framework. There are sample collectors that will fetch samples from local folders, RSS feeds and IMAP folders (for now). These samples are queued where the sample workers will grab them and apply specific filters depending on it's file type. Those plugins might enrich sample metadata, extract other artifacts and retrofeed into Aleph for further analysis making all the cross-reference chain in place.
The plugins may also add some warning flags based on their findings to give the researcher a more digested info than interpreting all the data.
All sample data is stored into a ElasticSearch database which makes easy to query and manage it's metadata fields without rebuilding tables and such.
All time and date data is UTC and converted on the fly to user's Timezone. We have internationalization and localization fully implemented and Aleph is available currently in English, Brazilian Portuguese and Spanish
2. Who we are?
Jan Seidl @jseidl
Aleph Project Lead Developer
*NIX/BSD freak
Digital tools blacksmith / python & C lover
Lousy guitar player
Coffee dependent
Hates printers, doesn't likes social networks
anything
Selectively-social
5. Definition
'Malware' is an umbrella term used to refer to a variety of
forms of hostile or intrusive software, including computer
viruses, worms, trojan horses, ransomware,
spyware, adware, scareware, and other malicious
programs.
It can take the form of executable code, scripts, active
content, and other software
Wikipedia
8. Detecting malware
● Signature-based
● Sample must be previously-known and flagged
as malicious
● Heuristics-based
● Can trigger loads of false-positives
● Behavior-based
● Can trigger loads of false-positives
10. Understanding malware
● Features extraction
● Which characteristics this file has?
● Feature correlation
● Make sense of features combinations /
disposition
● Sample correlation & Family classification
● Identify common features between different
samples
11. Understanding malware
● Enables you to identify families
● Enables you to identify acting groups
● Enables you to identify techniques
● Enables you to identify trends
13. Manual approach
● Use lots of separate tools to extract data
from sample (each in its own format)
● Correlate output from the tools using
spreadsheets, word files, napkins, tears
15. Manual approach
● Find out new samples embedded into
original sample
● Rinse, repeat, get more whiskey/coffee
18. Automated approach
● Insert sample into one end
● Wait until processing is done
● Get report on the other end
● Get emotional about hours of work saved
● Focus on most important evidences
25. Main Features
● Cross-platform (tested on: Windows, Linux,
OS X)
Almost all modules are pure-python
● Scalable + Easily Extensible
● Web Interface for browsing reports
29. Aleph Process:
Collection
● Detect new file on medium (filesystem,
email account etc)
● Check if meets predefined criteria
(min/max size)
30. Aleph Process: Triage
● Detect file type (mimetype)
● Calculate hashes
● Add sample to process queue
31. Aleph Process:
Processing
● Enumerate plugins suitable for sample
mimetype
● Run plugins and extract features
● Save features as structured data into
database
33. Currently supported
files● Windows Portable Executable (PE) (exe,
cpl & dll)
Coming up support for:
● Android APK
● PDF Documents
● Linux ELF
● iOS Apps
● URLs & Emails
● Apple Mach-O
● MS Office Documents
● SWF & Much more!
52. Deployment Types
Deployed in a single host containing all
the required services.
3rd
Party Software Aleph Components
Redis
Local Filesystem
Elasticsearch
SQLite
Collector
Processor
Web Interface
53. Deployment Types
Deployed across multiple hosts in order
to achieve HP and HA.
Datastore Host Group
Elasticsearch Cluster
Nodes
Transport Host Group
RabbitMQ Cluster Nodes
Processing Host Group
Aleph Cluster Nodes
Web Interface Host
Group
NGinx Cluster Nodes
Collection Host Group
Aleph Cluster Nodes
Storage Host Group
DFS Cluster Nodes