For most federal agencies dealing with increased security threats, limiting machine-data collection is not an option. But faced with finite IT budgets, few agencies can continue to absorb the high costs of scaling high-end network attached storage (NAS) or moving to and expanding a block-based storage footprint. During this webcast, you’ll learn about more cost-effective solutions to support large-scale machine-data ingestion and fast data access for security analytics.
You’ll learn about:
- The common challenges organizations face when scaling security workflows
- Why a high-performance cache works to solve these issues
- How to integrate cloud into processing and storage for additional scalability and efficiencies
6. Security Analysis Workflow
• Acquire and Aggregate Inputs
• Normalize Data
• Archive Raw / Archive Normalized Data
• Analyze for Patterns
• Alert or Remediate
7. Types of Inputs
• Network Equipment: routers, switches, firewalls, VPN appliances, etc.
• IT: physical servers, VM infrastructure, virtual machines, directory services,
end user desktops/laptops
• Application Layer: log files, access logs for applications, web servers
• Miscellaneous: sensor data
9. Typical Ingest Workflow
Ingest Node(s)
Normalize/Filter
Storage
I/O at scale can slow
down storage, backing up
entire workflow
10. Typical Ingest Workflow
Ingest Node(s)
Normalize/Filter
Storage
Analyze data
Report/Alert
If analysis is not co-located,
latency can impede
analysis
11. Typical Analysis Workflow
Ingest Node(s)
Normalize/Filter
Storage
Analyze data
Report/Alert
If analysis is not co-located,
latency can impede
analysis
12. The meta-problem: Log File Ingest and Processing
12
All router1 log files from the beginning
of time...well, from when we started
gathering them...
All log files from firewall 1 All log files from server 1Time
and
Volume
NET: A lot of historical data over time, applying pressure
both in terms of storage and processing
15. 5 challenges when engineering security workflows
15
1. Ingest Latency and Throughput
2. Vendor Lock-In
3. Life Cycle Management
4. Data Availability and Redundancy
5. Cloud Integration
16. Challenge 1: Ingest Latency and Throughput
Ingest Node
Normalize/Filter
Storage
I/O at scale can slow down
storage, backing up entire
workflow
17. Ingest Latency Scales Too
Storage
The scale forces multiple storage sites,
and on some products requires a
replication mechanism, introducing more
cost, overhead and latency.
Ingest Node(s)
Normalize/Filter
Storage
Storage
Storage
Storage
Volume of inputs will
drive the number of
ingest nodes required.
IOT devices are increasing the amount of
log data being generated and ingested.
18. Challenge 2: Vendor Lock-In
Storage
Ingest Node(s)
Normalize/Filter
Storage
Storage
Storage
Storage
As solution scales:
1. Additional demand for storage increases costs, and
lowers performance.
2. Deficiencies in current storage solution amplifies as
deployment grows, longer upgrade outages.
3. Vendors often limit interoperability with other products
when it comes to replication and tiering.
19. Vendor Lock-In
Ability to transfer data to new solution is difficult:
• Business Continuity -- How do you stop ingesting and processing logs?
• Interoperability -- How do you ensure that your new/proposed storage solution will
work well in a high-performance environment
: Ingest performance
: Read performance
: Scale
20. Challenge 3: Data Lifecycle Management
Storage
Ingest Node(s)
Normalize/Filter
Storage
Storage
StorageMore Expensive
Storage
Value of the data
1. Lowering the cost to store means you can store
more and derive greater value.
2. New analytic methods/tools bring a fresh round of
analysis and burst workloads.
3. How can we begin to build AI based workloads?
Cheaper
Storage
Warm Data
Cold Data
21. Lifecycle Management
Storage Performance
• Ingest & analysis requires higher performance storage = More expensive
storage
• Over time simply too much data to store in performance tier
• Deletion of older data possible?
22. Challenge 4: Data Availability and Redundancy
Ingest Node
Normalize/Filter
Storage
Ingest Node
Normalize/Filter
Ingest Node
Normalize/Filter
Storage
Storage
Reporting and
Processing
Node
Reporting and
Processing
Node
Reporting and
Processing
Node
23. Data Availability and Redundancy
• Performance at scale requires distributing the reporting/analysis
• Geographical location of ingest may also be distributed
• Critical data: can’t lose it so avoid single point of failure
• Large data sets with streaming data are extremely difficult to backup with
traditional methods and are cost prohibitive
24. Challenge 5: Cloud Integration: Storage
Storage
Ingest Node(s)
Normalize/Filter
Storage
Storage
Storage
Storage
1. How to start archiving data to Cloud Storage, in
order to lower cost?
2. How can businesses leverage cloud-based AI
workloads against the same data they have
today?
25. Cloud Integration: Storage
Cloud Storage Pros
• Cloud Storage can be very
inexpensive
• Reduces the need to own and
maintain additional IT assets
• Public clouds have built-in
redundancy
Cloud Storage Cons
• Cloud storage is eventually
consistent...a query immediately
after a write may not succeed
• Lowest-cost storage is object, and
requires S3-compliant application
access
27. How can Avere address those challenges?
27
Challenges Avere Solution
Ingest Latency and Throughput Avere write caching
Avere FXT NVRAM
10GB+ Bandwidth
Vendor Lock-In Global Namespace
Flash Move & Mirror
Life Cycle Management
Data Availability and Redundancy HA, Clustering
Flash Move & Mirror
Cloud Integration Avere vFXT compute-based appliance
Avere FlashCloud for AWS S3 and GCS
28. Speed Ingest via write-behind caching
• Gather writes (ack’ing clients immediately) and flushing in parallel
• Hardware: NVRAM for write protection and caching
• Clustered caching solution distributes writes across multiple nodes
Accelerate read performance with distributed, read-ahead caching
• Read-ahead a request (read a bit more than what was requested)
• Cache requests for other readers (typical in analytic workloads)
• Writes cached as written, speeding analysis workloads
The Power of High-Performance Caching
29. Caching Basic Architecture
Ingest Node
Normalize/Filter
Storage
● Ingest nodes write to NFS
mount points distributed across
a caching layer
Ingest Node
Normalize/Filter
Ingest Node
Normalize/Filter
● Writes are flushed to storage over time,
smoothing the ingest
● Writes are protected within the cluster via
HA mirror
Reporting /
Analysis
Node(s)
● Reporting / Analysis nodes access
data via the cluster.
● Reads are cached, eventually aged
● Written data in the near term is cached
and available
Avere FXT Cluster
30. Data Placement
Ingest Node
Normalize/Filter
Storage
Ingest Node
Normalize/Filter
Ingest Node
Normalize/Filter
Avere can Mirror data to
cloud storage for longer-
term archiving
Data is accessible
through the cluster, as
though it were on the
primary storage
Reporting
Analysis
Node(s)
32. Avere Security Workflow
FXT Cluster
DMZ Network
Central Control
Container based
applications to Normalize
data
DATA
Syslog/NetFlow/…
DATA
Streaming data to Avere
with no direct access
Core Filers
Splunk
Configure Splunk to eat data from
separate vServer (isolating traffic)
Splunk Data Consumers
Web access to visualize data ingested
and analyzed by Splunk
Mirror/Migrate/Cloud
Core Filers
34. Contact Us!
34
Keith Ober
Systems Engineer
Avere Systems
kober@averesystems.com
Bernie Behn
Principal Product Engineer
Avere Systems
bbehn@averesystems.com
AvereSystems.com
888.88 AVERE
askavere@averesystems.com
Twitter: @AvereSystems