5. (c) 2015-16, No reproduction without permission
Data Value Chain
6. (c) 2015-16, No reproduction without permission
Big Data Architecture
3 Major Pieces
● A scalable and available storage mechanism,
such as a distributed filesystem or database
● A distributed compute engine, for processing
and querying the data at scale
● Tools to manage the resources and services
used to implement these systems
7. (c) 2015-16, No reproduction without permission
CAP theorem
For the always-on Internet, it often made sense
to accept eventual consistency in exchange for
greater availability.
8. (c) 2015-16, No reproduction without permission
Classic Batch Architecture
9. (c) 2015-16, No reproduction without permission
Batch Mode Characteristcs
10. (c) 2015-16, No reproduction without permission
Challenge
Batch Mode would not work for streaming needs
of today.
Example, breaking news on Google!
12. (c) 2015-16, No reproduction without permission
Fast Data
Fast data comes into data systems in streams;
they are fire hoses.
These streams look like
observations, log records, interactions, sensor
readings, clicks, game play etc
Hundreds to millions of times a
second.
13. (c) 2015-16, No reproduction without permission
Advantages of Fast Data
● Better insight
● Better personalization
● Better fraud detection
● Better customer engagement
● Better freemium conversion
● Better game play interaction
● Better alerting and interaction
14. (c) 2015-16, No reproduction without permission
Fast Data Architecture
15. (c) 2015-16, No reproduction without permission
Parts
1. Streams of Data, IoT, firehose, etc
2.REST Calls
3. Microservices, Reactive Platform
4. Zookeeper consensus and state management
5. Kafka connect for persistence
6. Low latency stream processing as runners in
Beam with Flink, Gearpump
7.Stream processing results persisted back
16. (c) 2015-16, No reproduction without permission
Parts
9. Pseudo stream with Spark
10. Batch mode processing and interactive
Analytics
11. Deployed to mesos, yarn, cloud
17. (c) 2015-16, No reproduction without permission
Stream Data characteristics
18. (c) 2015-16, No reproduction without permission
Core Principles
● Event Logs – almost everything is an event.
DB Crud transactions, Telemetry from IoT
devices, Clickstream etc
● Event logs enable ES and CQRS
● Message Queues are core integration tool
● Message guarantees of At Most Once, At
Least Once and Exactly Once
19. (c) 2015-16, No reproduction without permission
Kafka
● Provides benefits of the Event Log
● Does not delete messages once they have
been read. Hence, partitions can be replicated
for durability and resiliency
20. (c) 2015-16, No reproduction without permission
Reactive Systems
● Responsive The system can always respond in a
timely manner, even when it’s necessary to respond
that full service isn’t available due to some failure.
● Resilient The system is resilient against failure of
any one component, such as server crashes, hard
drive failures, network partitions, etc. Leveraging
replication prevents data loss and enables a service
to keep going using the remaining instances.
Leveraging isolation prevents cascading failures.
21. (c) 2015-16, No reproduction without permission
Reactive Systems
● Elastic You can expect the load to vary
considerably over the lifetime of a service. It’s
essential to implement dynamic, automatic
scala bility, both up and down, based on load.
● Message driven While fast data architectures
are obviously focused on data, here we mean
that all services respond to directed
commands an
22. (c) 2015-16, No reproduction without permission
Lambda Architecture
23. (c) 2015-16, No reproduction without permission
Sample Application
24. (c) 2015-16, No reproduction without permission
Copyright (c) 2016-17 Knoldus
Software LLP
This training material is only intended to be used by
people attending the Knoldus training. Unauthorized
reproduction, redistribution, or use
of this material is strictly prohibited.