10. Data Source
Event bus for business events
Log Scrapping
Transaction log scraping
(Oracle GoldenGate, MySQL binlog, MongoDB oplog, Postgres BottledWater, SQL Server fn_dblog)
Change Data Capture
Application messaging/JMS
Micro batching
(High watermarked, change tracking)
11. Kafka - Flow Management
No nonsense logging
100K/s throughput vs 20k of RabbitMQ
Log compaction
Durable persistence
Partition tolerance
Replication
Best in class integration with Spark
12. Columnar Storage
Optimized for analytic query
performance.
Vertical partitioning
Column Projection
Compression
Loosely coupled schema.
HBase
AWS Redshift
Parquet
ORC
Postgres (Citrus)
SAP HANA
14. Spark Streaming
What? A data processing framework to build
scalable fault-tolerant streaming
applications.
Why? It lets you reuse the same code for
batch processing, join streams against
historical data, or run ad-hoc queries on
stream state.
21. Can Spark Streaming
survive Chaos Monkey?
http://techblog.netflix.com/2015/03/can-spark-streaming-survive-chaos-monkey.html
22. Lambda Architecture
Lambda architecture is a data-processing
pattern designed to handle massive
quantities of data by taking advantage of
both batch- and stream-processing methods.