17. Windows Security Events Data
On average, an online service in O365 produces 30 billion
sessions/day; 82 TB/day
Data: Sequences of Windows security event IDs from user
sessions
• Examples: User logs into machine, process start, credential
switch, etc.
• 367 unique security event IDs
18. - We built separate models to detect
our goal of compromised
account/machines
- The models, independently assess if
the account is acting suspiciously
19. probability of logging
sequences of events
credential elevation
auto-generated
25. Testing the system
• Wargame with the red team
• Blind experiment
• 8 out of 12 top-ranked sessions on day
1 among ~28 billion sessions are pen
testers, precision at 12 is 96%
30. Reality
Constantly changing environment…
….but you can account for it during training
and adding metadata
In the beginning, there will be false positives…
….but you will reduce your attack surface
No labelled data…
….but you can get away with a good red team
31.
32. Takeaways
Combine alert streams
Make your alerts interpretable
Capture feedback and close the last mile
Check out ranking algorithms – they are
powerful!
Editor's Notes
Lateral movem
“After the lateral stage, attackers are now virtually undetected by traditional security methods”
- Connecting APT Dots1
[1] http://about-threats.trendmicro.com/cloud-content/us/ent-primers/pdf/tlp_lateral_movement.pdf
Single detections… rarely indicate security-interest
Malware detected….what next?
Suspicious process launched…what next?
Unusual logins…what next?
-> O365 has 150 detections in the pipeline; ArcSight has 100 or so detections that come out of the box.
N detections, k top alerts surfaced = N*k alerts for the analyst to triage
Hand written rules, they might be useful volume of the alerts.
Interpretable - rules might be more interpretable. Be interpretable but also have low noise.
Classification probably isn’t the right way to think about approaching ad hoc IR:
Classification problems: Map to a unordered set of classes
Regression problems: Map to a real value
Ordinal regression problems: Map to an ordered set of classes
A fairly obscure sub-branch of statistics, but what we want here
This formulation gives extra power:
Relations between relevance levels are modeled
Documents are good versus other documents for query given collection; not an absolute scale of goodness