The document discusses using a microservice architecture approach for big data pipelines. It outlines some of the benefits of microservices like modularity, scalable development, and painless deployment. It then describes how these same principles can be applied to big data pipelines by developing them as independent and interchangeable "data modules". The presentation provides examples of how data modules can be containerized and work together like microservices to process streaming data. It concludes that a microservices-style architecture through data modules enables many of the same benefits for big data pipelines as it does for traditional applications.
4. www.realimpactanalytics.com
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
Modularity is Imperative for On Premise Deployments:
RealImpact
Product Product Product
Client Client Client
6. www.realimpactanalytics.com
Micro Services: Maximal Modularity
1. No shared state
2. Minimal coupling
3. Separation of concerns
4. Mix & match
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
7. www.realimpactanalytics.com
Micro Services: Scalable Development
1. Team responsibility
2. Less code = faster ramp up
3. Technology independence
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
8. www.realimpactanalytics.com
Micro Services: Painless Deployment
1. Reproducible environments
2. Versioned APIs
3. Installation = docker-compose up
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
Prod
Dev
9. www.realimpactanalytics.com
Micro Services: QA Friendly
1. Three levels of testing
• Class / function level
• Service level
• Integration level
2. Staging is no big deal
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
10. www.realimpactanalytics.com
Translation to Big Data Pipelines…
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
TrendingAnalysis
Twitter Data
TopTweeters
Recommend
12. www.realimpactanalytics.com
container
Translation to Big Data Pipelines…
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
TrendingAnalysis
datasources:
- twitter
outputs:
-
id: daily-trends
fields:
-
name: keyword
type: string
-
name: relevance
type: integer
parameters:
…
manifest.yaml
run.sh
jar
runtime
13. www.realimpactanalytics.com
Translation to Big Data Pipelines…
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
TrendingAnalysis
HDFS
Input Data
Result
Parameters
15. www.realimpactanalytics.com
Data Modules: QA Friendly?
1. Three levels of testing ✔
• Class / function level
• Module level
• Integration level
2. Staging is no big deal ✔
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
16. www.realimpactanalytics.com
Data Modules: Painless Deployment?
1. Reproducible environments (✔)
2. Versioned APIs ✔
3. Installation = docker-compose up (✔)
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
Prod
Dev
17. www.realimpactanalytics.com
Data Modules: Scalable Development?
1. Team responsibility ✔
2. Less code = faster ramp up ✔
3. Technology independence ✔
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
18. www.realimpactanalytics.com
Data Modules: Modularity?
1. No shared state (well…)
2. Minimal coupling ✔
3. Separation of concerns ✔
4. Mix & match ✔
1. Challenge:
Big Data in Production
2. Zen of Micro Services
3. Data Modules
4. Conclusion
20. Brussels Office
5, Place du Champ de Mars
1050 Brussels
Belgium
Cape Town Office
Sovereign Quay, 34 Somerset Road
8005, Green Point, Cape Town
South Africa
São Paulo Office
93, Rua Doutor Andrade Pertence
Vila Olímpia, São Paulo
Brazil
Luxembourg Office
691, rue de Neudorf
2220 Luxembourg
Grand Duché du Luxembourg
www.realimpactanalytics.com
Kuala Lumpur Office
28-01, Integra Tower 348 Jalan
Tun Razak, 50400 Kuala Lumpur
Malaysia