1) The document discusses how machine learning (ML) techniques can be used by red teams to drive offensive cyber operations in a more intelligent and adaptive manner.
2) Specifically, it provides examples of how ML can be used for spear phishing by analyzing past user click data to select the most relevant phishing emails tailored to each individual user's role.
3) The document argues that embedding intelligence into attacks using ML can make attacks more effective while making it more difficult for blue teams to build attackers' tactics, techniques, and procedures (TTPs).
10. Context Data available to all authenticated
users
Identity used - 1
Exfiltration Size - ~4Gb
Data Sources
Active Directory User/Groups
Machines Local group
membership
Implementation
• SQL Azure
• Service Bus
• Azure Worker Role
• Remote Powershell
10
17. StrategicAdvantages
17
• Surgical
• Fly under most radar
• Limited TTP exposure
• Routes can be saved/replayed/measured
• Long shelve life
• Not bound to PtH only
24. MLDrivenSpearPhishing
24
How can Red Teams use Machine
Learning
• Subvert existing ML algorithms that defenders have put in place
• Classic “Adversarial Machine Learning”
• Key goal: Game the ML System
• Check out: http://www.slideshare.net/RamShankarSivaKumar/subverting-
machine-learning-detections-for-fun-and-profit (Derbycon2014)
• Think of attacks as a large scale
optimization problem and ML to solve it
26. ML-Approach
26
• Problem: Which phishing mail should be sent to a victim?
• Why Use Machine Learning?
-> Targeted Phishing emails increase likelihood of compromise
• Distinguished Engineer: Subj: Country Club Invitation
• Program Manager: Subj: Kanban Notes
• Developer: Subj: Code check In?
-> Makes blue team’s job of building attacker’s TTP and IOC much
more difficult
• Machine Learning task: How to pick the right email per
person?
28. Contextual Bandit arms -- Intuition
•The world announces some
context information (Program
Managers like meetings).
•A policy chooses arm a from 1
of k arms (i.e. 1 of k phishing
emails).
•The world reveals the reward ra
of the chosen arm (i.e. whether
the message is clicked on).
29. Experiment
29
Objective - Recommend the most appropriate email for the user, based on his
role
Data Set:
1) Leverage data from (previously/currently) compromised hosts
2) Input: Email Corpus , context (title of role), action (clicked, not-click),
featurization (time of click, number of words…)
Tooling - Vowpal Wabbit (- I/O bound, parallelizable, specific for large scale
learning)
Result -
Overall Click through rate (CTR) increased by 23%, with the highest increase in
Program Managers (+22%) and least in Developer (5.4%)
30. RedAdvantages
30
Takeaways
1) Embedding intelligence into attacks, can make it more effective.
ML can make attacks adaptive too!
2) The tricky part is mapping the attack goals to the right kind of
problem - Short, but steep learning curve.
-> Tip: Borrow the blue team’s behavorial detections and use
the same tools, against them.
Cloud vs Cloud – We use the cloud against itself.
Red vs Blue – At the end of the day, we are all blue. Our objective is to increase MTTC and MTTP and decrease MTTD/MTTR. I think the biggest impact at the moment is really between MTTC/MTTP. Focus on improving detection and recovery flows.
Engineering mindset – Everything needs to SCALE
We are probably more engineering heavy than others
Due to risk of introducing malware, we implement our own tools
The share scope of Azure requires that everything / everyone can scale.
We focus on capturing our scenarios for scalability and reproducibility
Also need a good flight recorder
Attribution is also something we worry about. We need have very good auditing/evidence collection on actions we performed.
Fixed Target
We focus only on Azure. We can monitor our target reaction over a very long period and adapt.
We model our design using Boyd OODA Loop.
Scalable infrastructure is now in the hands of any
Can learn from the same environment
Tools like Immunity SWARM seems to be going in that direction also
Cloud advantages
Very portable
Rich and stable environment
Very fast IO (super fast exfil)
Based on our experience in the cloud and threat intel, we anticipate APT to develop and leverage new technologies and tactics.
For the purpose of this presentation, we will focus and share only on Intelligence Drive and Machine Learning
The data presented is based on red team exercises and sampled/adapted for this presentation. This is not a representation of Microsoft/Azure state.
Patient 0 was scope with recon data based on matching group membership with Target. We start post compromise
“How can I get to my target ?”
Talking Points:
- NextGen APTs TTPs include the previous attributes plus these.
Artificial-Intelligence or AI-like attack automation?
Recon data tells us that patient 0 is administrator (domain/local) to 7 servers. From that, we know we can move forward to them. We now have 7 possible pivoting route to explore.
We explored the previous routes and found new Identities (users) we can pivot to. “Rinse/Repeat” recon data give us new routes for each identities.
Furthermore, recon data tells us that we now have a direct route to our target (admin on same box)
Actions
We ran credential extraction on 7 servers managed by patient 0.
Routes
New routes were found. New assets remains untouched. We were able to discovery path based on recon data and result of credential extraction action.
Routes to other identities are also captured and can be reapplied for future pivoting purposes. Pivoting may have a long lifetime (bound to the users/role)
NEXT SLIDE ARE EXAMPLES
Mean time to compromise – what ever method, this was not calculated. Recon ops should be added to MTTC
Mean time to pwnage (MTTP) – seconds (decision was made fast by C2).
Actions
7 initial mimikatz run. 1 connection to pivot and 1 more mimikatz run
Pivots
We had to pivot from 2 servers to achieve our objective. One to reach Other identities and the other to reach out target
Routes
We discovered 12 routes total with 1 level down. The pivoting playbook can be applied and find all routes using this specific TTP
Blue leanring
All data available to them in machine readable format
The playbook can be reused for different context. All pivoting scenarios can be discovered
Measure impact of a change and Key Progress Indicator
TRANSITION – We used one method of pivoting (PtH) but it’s not bound to it
We explored the previous routes and found new Identities (users) we can pivot to. “Rinse/Repeat” recon data give us new routes for each identities.
Furthermore, recon data tells us that we now have a direct route to our target (admin on same box)
Actions
We ran credential extraction on 7 servers managed by patient 0.
Routes
New routes were found. New assets remains untouched. We were able to discovery path based on recon data and result of credential extraction action.
Routes to other identities are also captured and can be reapplied for future pivoting purposes. Pivoting may have a long lifetime (bound to the users/role)
Surgical
Very limited unnecessary moves (sometimes the machines you query just don’t have what you need. The query ‘mimikatz’ run is only once and no further actives will be done there.
Fly under most radar
It is very fast and can often achieve it’s target before log/audit data is replicated or even to get someone to investigate. Very useful for quick smash &grab where you know the routes to get to your target asset.
Limited TTP exposure
They only see what actions you did. The logic and what you base this on is unknown to them. They have to connect the dots with the initial recon data
The route and pivot method can be reversed by blue but The entire logic, routes and capabilities remains hidden
The exposure is also better understood
Route can be saved/replayed/measured
Routes can be captured and replayed over time
The routes efficiency/lifetime can also be measured. This is great to measure Blue team response/recovery time. (KPI)
Long shelve life
The recon data it relies on has a very long shelve lifetime. Corporate infrastructure don’t change that much. The “Moving Target” approach is very rare
Therefore, the routes also have long shelve time.
Not bound to PtH only
I consider PtH as a simple credential harvesting flow and many more exists. The scenarios also pave the path to other usage – TRANSITION TO NEXT SLIDE
NEXT SLIDE IS ML – SETUP TRANSITION
Paving Egress Routes
You can identity routes to egress servers based on location and pave the way via pivoting and infecting (port relay)
Path avoidance
You want to avoid highly monitoring areas and go down a safer route, you can build that.
You can also avoid certain paths based on prior failure
Beachhead candidates
Looking for machine with FE, WEB, FTP, HTTP … may give good beachhead with egress out. Can be combined with Paving Egress Routes flow
Cloud Pivoting
The pivoting scenarios we showed leverages 1 pivot method using PtH / creds in memory but other credentials can be extracted and used for lateral movement.
We included Azure specific credentials and implemented pivoting scenarios through storage accounts. SAFEGUARD your cloud storage (IaaS VHD in Storage)
Previous section, was about how as an attacker, you can observe (and collect data) at scale, and finally act (i.e. pivot)
This section is about the Decide – how do you use machine learning to your final attack goal
Traditional programming;
The first operative word is distributed computing. Malicious actors using the cloud to cause harm, is not new. One noteworthy example, is Moxie Marlinspike's infamous cloudcracker. This tool, lets you upload the handshake of MS-CHAPv2 protocol (widely used by VPN providers), and then cracks the NTHash which in essence is the MD4 hash of the user password. Using just 48 FPGAs, the researchers at ThoughtCrime labs, showed that the worst case for breaking the 3DES keys, which stems from the NTHash, is a mere 23 hours. So, in less than a day, you can crack someone's VPN password, and go crazy with it. This tool has been around for two years now, and many such services have mushroomed since. However, at the core of things, though the actors are using multi-core processors, GPGPUs or say, AWS, they really weren't doing any analysis. There is a word, in machine learning for a model that memorizes every training example: rote learners. Yeah… it was cool, that you could "crack" the password, but ultimately, you are using dictionaries and look ups, and like a rote learner, you have memorized your dictionary, and brute forcing the way through it. And like rote learners, you are not doing any sort of learning.
The second operative word here is analysis. This is more novel than the use of distributed computing, in our opinion. Consider the use of image processing in solving captchas. Researchers have shown that deep neural nets, and a combination of character matching, can potentially parse out the characters in the computational puzzle. However, most of the techniques do not operate at scale, because the algorithms haven't been parallelized, and the lack of training data for the system.
The second operative word here is analysis. This is more novel than the use of distributed computing, in our opinion. Consider the use of image processing in solving captchas. Researchers have shown that deep neural nets, and a combination of character matching, can potentially parse out the characters in the computational puzzle. However, most of the techniques do not operate at scale, because the algorithms haven't been parallelized, and the lack of training data for the system.
An adversary can subvert existing ML algorithms that defenders have put in place
A classic example, is an attacker trying to overwhelm an anomaly detection system or a set of trolls trying to game a crowdsourcing system. This is a rising field, known in academic circles as “Adversarial Machine Learning” wherein defenders are trying to build robust Machine learning systems that can perform adequately despite adversaries. Along with John Walton, I gave a talk on this topic at Derbycon and BlueHat last year, which you may find useful (slides available here: http://www.slideshare.net/RamShankarSivaKumar/subverting-machine-learning-detections-for-fun-and-profit)
An adversary can use Machine learning to sharpen his attacks.
This is going to be the focus at Infiltrate 2015 – How can attackers leverage Machine learning toolkits to make their system more stealthy. While the former is a rising field, this is an area where there is extremely little prior art. The common focus is to either take a Markovian Decision approach (trying to treat attacks as a sequence of steps, and optimizing for completing the steps within an objective) or the painfully slow symbolic execution approach (wherein you are trying to verify the logical correctness of a piece of code – if not, you know you’ve found an exploit). There has been virtually no research done on how to leverage the analytics suite to sharpen the attacks. The Azure Red Team and Azure Security Data Science team seized this opportunity, to explore this blue ocean, thus stemming the belief that attackers have a lot to gain from using machine learning.
The data presented is based on red team exercises and sampled/adapted for this presentation. This is not a representation of Microsoft/Azure state.
An attacker has gathered data about his victim. He now wants to construct a spear phishing message, so that he is assured that his victim would open it
Very well studied, and perhaps the most lucrative since search algorithms. Everything from Netflix,
Collaborative filtering –
Content-based filtering
Here is a hypothetical situation – You have a row of slot machines , with varying levels of money rewards (some in pennies; some in 1000s of dollars). Which set of arms should you pull, such that the reward is maximized.
The world announces some context information (Program Managers like meetings).
A policy chooses arm a from 1 of k arms (i.e. 1 of k phishing emails).
The world reveals the reward ra of the chosen arm (i.e. whether the message is clicked on).
Objective - Recommend the most appropriate email for the user, based on his role
Training Set:
Leverage data from (previously/currently) compromised hosts.
Input: Email Corpus , context (title of role), action (clicked, not-click), featurization (time of click, number of words…)
Tooling - Vowpal Wabbit (- I/O bound, parallelizable, specific for large scale learning)
Result -
Overall Click through rate (CTR) increased by 23%, with the highest increase in Program Managers (+22%) and least in Developer (5.4%)
Embedding intelligence can make it more effective. You can account for many things – if the environment is non-stationary and always changing; In theory, you can acc
Targeting and surveillance
Can have a specific target and monitor them until opening occurs or perform long/scale recon
New indicators can be reversed. Indicator of Monitoring, Detection/Recovery. Invest in Telemetry!
Automated and reusable attack planning
The pivoting scenario demonstrated that routes were precomputed and extracted as move were made. The sequence of actions can also be captured and reapplied.
For example “Before earnings, I want all files under a specific folder “
Small Footprint
Many of the steps are done “offline” where no events /audit is done. Actions are surgical and leave a minimal trace. Also makes cleanup easier
Controlled Exposure
The traces left are clearer and only actions where disclosed. Other capabilities/routes are hidden. Dumb bots receiving orders
Flexible
We can introduce new tools/capabilities very easily
Improve IP retention
There are many tricks/TTP/ideas which are difficult to retain . Red teamer/Pen tester are rare and it’s difficult to retain them. They may also feel tired of going after the same target.
Assume Breach
Promote/adopt assume breach
Don’t assume you know how (SDL/TM) assume it will happen. Invest in areas of detection/anomalies and recovery efficiency
Accelerate growth
If you don’t work out, you don’t build muscle. Perform war game exercises. Start with table top exercises first and engage redteam/pentest on scenarios that are considered “solid”. Use threat intel to prioritize war games also.
Moving Target
Reduce shelve time of recon data. Force attackers to come back and also detect them when they come back with old data.
Keep it simple – Investments like JIT (level 1 = single user elevated
Level 1 - Single user elevated
Level 2 (after red team) – Single use user
Pivoting
Reducing pivoting opportunities drastically reduce exposure and simplify mitigation. Efforts like reducing administrators is a way to reduce pivoting. Proper isolation between pro/test is another one.
Sharing
Sharing is caring. More IOC sharing is occurring and it’s forcing actors to change / create TTP faster.