SlideShare a Scribd company logo
1 of 48
Download to read offline
Case Studies
on
Big-Data Processing and Data Streaming
By: Amir Sedighi
LinkedIn: http://linkedin.com/in/amirsedighi
Twitter: @amirsedighi
JUG - A.Sedighi - 2015 2 / 48
Background
● BS and MS degrees in Software Engineering
● Senior Software Engineer
– +20 Years of Programming Experience
● Cross-platform Software Development
– +4 Years of Big-Data Processing and Machine-Learning Experience
● Log Management and Forensic
● Big-Data Visualization
● Data Warehouse using Big-Data Technologies
● Recommender Systems
● Analytical Real-Time Search Engines
● Integrating Fedora Digital Library with HDFS
● Next Generation Event Processing
● Online Resume
– http://linkedin.com/in/amirsedighi
JUG - A.Sedighi - 2015 3 / 48
Outline
● An Introduction to Big-Data Processing
● Big-Data and Processing and Data Streaming
– Data Processing
1. +TB Scale Data Warehouse
2. Analytical Real-Time Search Solution and BI
3. Scaleable Recommender System
4. Integrating Fedora Digital Library with HDFS
– Stream and Event Processing
1. Super Fast Scaleable Log Management, Forensic and BI
2. Super Fast Scaleable Fraud Detection
JUG - A.Sedighi - 2015 4 / 48
What Big-Data Is?
JUG - A.Sedighi - 2015 5 / 48
● Every 2 Days Human Create As Much Information As We Did
Up To 2003 - Eric Schmidt
JUG - A.Sedighi - 2015 6 / 48
Big-Data Characteristics
● Volume
● Variety
● Velocity
JUG - A.Sedighi - 2015 7 / 48
You're a Part of It Every Day
● We've have the ability to store anything
● Companies and people are generating data like
never before in history
– Social Networks
– Online Web Portals
– Log Writers - Our Digital Footprint!
JUG - A.Sedighi - 2015 8 / 48
You're a Part of It Every Day
● Big-Data is whatever people do in the digital world,
including the foot print of what people, companies,
devices and services do (Logs), including traditional
tabular data stores.
JUG - A.Sedighi - 2015 9 / 48
As a Manager still You're a Part of It
● “Over half of the business leaders today, realize they
don't have access to the insights they need to do their
job.” - IBM
JUG - A.Sedighi - 2015 10 / 48
Vertical or Horizontal?
JUG - A.Sedighi - 2015 11 / 48
Scale Up or Scale Out
JUG - A.Sedighi - 2015 12 / 48
Linear Scalability
JUG - A.Sedighi - 2015 13 / 48
Big-Data Processing Solutions
JUG - A.Sedighi - 2015 14 / 48
Q: How To Be Linear Scaleable on Commodity
Machines?
A: MapReduce
JUG - A.Sedighi - 2015 15 / 48
Q: How to store big data on commodity machines?
A: Distributed File System
JUG - A.Sedighi - 2015 16 / 48
Replication → Fault Tolerant
Replication → Data Locality → Utilization
JUG - A.Sedighi - 2015 17 / 48
Big-Data Processing, Most Popular
Technologies
● Apache Hadoop Ecosystem
● NoSQL Databases
– HBase
– Cassandra
– MongoDB
– Neo4j
● Elasticsearch
– Lucene
– SolR
● Java
JUG - A.Sedighi - 2015 18 / 48
+TB Scale Data Warehouse
1
JUG - A.Sedighi - 2015 19 / 48
DW Solution
● SQL
● ETL
– RDBMS
– NoSQL
– File System
● REST API
JUG - A.Sedighi - 2015 20 / 48
REST Admin Panel
JUG - A.Sedighi - 2015 21 / 48
Features
● Extendable Capacity for Data Warehousing
● Making Very Big Integrated Databases Based on Different
Technologies/Schemas
– DB2, Oracle, MS-SQL …
– Different Schemas Such as HRMS, Banking, Sales...
– Making Small Dense Integrated RDBMSs
● SQL Language Interface
● Linear Scalability
JUG - A.Sedighi - 2015 22 / 48
Main Technologies and Frameworks
● Apache Hadoop
– Sqoop
– YARN/HDFS
– Hive or Drill or Impala
● Microservices Architecture
– Java 1.7
– Spring Boot
JUG - A.Sedighi - 2015 23 / 48
Analytical Real-Time Scalable Search Solution
and BI
2
JUG - A.Sedighi - 2015 24 / 48
+TB Scale RT Searching
● Indexing Incoming Data on-the-fly
● Highly Scaleable and Reliable
● Simple or Complex Queries
● REST API
● Schema Agnostic
● Customizable GUI and BI
JUG - A.Sedighi - 2015 25 / 48
Business Intelligence
JUG - A.Sedighi - 2015 26 / 48
Rich GUI
JUG - A.Sedighi - 2015 27 / 48
Main Technologies and Frameworks
● Elasticsearch
– Apache Lucene
– REST
● Kibana
JUG - A.Sedighi - 2015 28 / 48
Scalable Recommender System
3
JUG - A.Sedighi - 2015 29 / 48
Recommender System
● Value-added Service (Loyalty Services)
● Machine-Learning
– Clustering Throw Thousands of Nodes
● Apache Mahout
● Super Fast
JUG - A.Sedighi - 2015 30 / 48
How It Works?
JUG - A.Sedighi - 2015 31 / 48
Technologies and Frameworks
● Microservices Architecture
● Java 1.6
● Apache Mahout
● Redis
Fedora Digital Library and HDFS Integration
4
Migrating from Expensive Servers to Commodity
Machines
● Making HDFS as Fedora Digital Library Storage
– Research and Development
– Hadoop 1.2, Later Hadoop YARN 2.2
– Integrating with SolR over HDFS
● Java 1.7
● Fedora
– Islandora
– GSearch
JUG - A.Sedighi - 2015 34 / 48
Data Streaming
JUG - A.Sedighi - 2015 35 / 48
Big-Data Streaming, Most Popular Technologies
● Piping and Messaging
– Kafka, Flume, FluentD and ZeroMQ
● Stream Processing
– Storm, Samza and Spark
● Machine Learning
– Machine Learning: MLLib and Mahout
● Persisting
– NoSQL DBs
– HDFS
JUG - A.Sedighi - 2015 36 / 48
Log Management, Forensic and BI
1
JUG - A.Sedighi - 2015 37 / 48
Log Management, Forensic and BI
● Every Digital Stuff Writes Things Into Log Files
– Log Files Are Streams of Data
– Log Files Are Messy
– Log Files Come Very Fast, in an Un-Predictable Manner
– Log Files Are About Everything within Your Business
● Log Files Are Full of Insight
– Who Can Hold Them For a Reasonable Period of Time
– Who Can Search Them Rapidly
– Who Can Visualize Them Easily (BI)
JUG - A.Sedighi - 2015 38 / 48
Network Topology
LB
Masters
Data
JUG - A.Sedighi - 2015 39 / 48
Main Technologies and Frameworks
● LogStash
– Flume
● Elasticsearch
● Kibana
JUG - A.Sedighi - 2015 40 / 48
Snapshot
JUG - A.Sedighi - 2015 41 / 48
Fraud Detection
2
JUG - A.Sedighi - 2015 42 / 48
Inputs & Outputs
● Inputs: One or multiple sources generate data continuously, in
real time
– Sensor Networks
– Transaction Logs
– Text Streams such as News
– Network Traffic Analysis
● Outputs: Up-to-date Answers generated continuously or
periodically
JUG - A.Sedighi - 2015 43 / 48
Data Processing
Transient Query
– Issued once, then forgotten
Persistent Data
Stored until deleted by user or apps
JUG - A.Sedighi - 2015 44 / 48
Stream Processing
Transient Data
– Deleted as Window Slides
Forward
Generated up-to-date
answers as time goes on
Persistent Queries
TimeBased
CountBased
JUG - A.Sedighi - 2015 45 / 48
Features
● Scalability
● Real-Timing, (Only 1 Second delay at most)
● Super Fast Decision Making
● Implementing Complex Fraud Scenarios Aa Easy as Defining
Queries
● Uniform Api For Processing Old or Early Events
JUG - A.Sedighi - 2015 46 / 48
Main Technologies and Frameworks
● Java 1.7, Scala 2.11
● Apache Flume
● Apache Kafka
● Apache Spark
Where To Start?
● You need Big Amount of Data
● You need to change your mind
– Rack Space and Number of Servers, IO and Process Limitations
● You need To Understand Fundamentals
– Linux (Bash Script)
– Java is a Most, Python works and Scala is an advantage
– SQL and ETL
– MapReduce, Resource Management and Serialization Frameworks
– Apache Hadoop Ecosystem and Successors
JUG - A.Sedighi - 2015 48 / 48
Thank You!, Question?
http://slideshare.net/amirsedighi

More Related Content

What's hot

Graphs & the Police: How Law Enforcement Analyze Connected Data at Scale
Graphs & the Police: How Law Enforcement Analyze Connected Data at ScaleGraphs & the Police: How Law Enforcement Analyze Connected Data at Scale
Graphs & the Police: How Law Enforcement Analyze Connected Data at ScaleNeo4j
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...GetInData
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science teamLars Albertsson
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Building your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing TechnologyBuilding your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing TechnologyTrieu Nguyen
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detectionMk Kim
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataVoltDB
 
Fighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceFighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceRon Bodkin
 
The Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4jThe Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4jNeo4j
 
Big Data Analytics and a Chartered Accountant
Big Data Analytics and a Chartered AccountantBig Data Analytics and a Chartered Accountant
Big Data Analytics and a Chartered AccountantBharath Rao
 
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4j
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4jGraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4j
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4jNeo4j
 
Snowplow the evolving data pipeline
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipelineyalisassoon
 
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Service
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Servicenl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Service
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud ServiceDaan Bakboord
 
WSO2Con EU 2016: An Effective Device Strategy to Accelerate your Business
WSO2Con EU 2016: An Effective Device Strategy to  Accelerate your BusinessWSO2Con EU 2016: An Effective Device Strategy to  Accelerate your Business
WSO2Con EU 2016: An Effective Device Strategy to Accelerate your BusinessWSO2
 
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, Ciena
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, CienaAutograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, Ciena
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, CienaNeo4j
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledgeChristopher Williams
 
Operationalized Analytics in the Enterprise
Operationalized Analytics in the EnterpriseOperationalized Analytics in the Enterprise
Operationalized Analytics in the EnterpriseRon Bodkin
 

What's hot (20)

Graphs & the Police: How Law Enforcement Analyze Connected Data at Scale
Graphs & the Police: How Law Enforcement Analyze Connected Data at ScaleGraphs & the Police: How Law Enforcement Analyze Connected Data at Scale
Graphs & the Police: How Law Enforcement Analyze Connected Data at Scale
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Building your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing TechnologyBuilding your data driven business with Reactive Marketing Technology
Building your data driven business with Reactive Marketing Technology
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time Data
 
Fighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligenceFighting financial fraud at Danske Bank with artificial intelligence
Fighting financial fraud at Danske Bank with artificial intelligence
 
The Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4jThe Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4j
 
Big Data Analytics and a Chartered Accountant
Big Data Analytics and a Chartered AccountantBig Data Analytics and a Chartered Accountant
Big Data Analytics and a Chartered Accountant
 
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4j
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4jGraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4j
GraphTour Keynote, Emil Eifrem, CEO and Founder, Neo4j
 
Snowplow the evolving data pipeline
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipeline
 
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Service
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Servicenl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Service
nl.OUG Tech Experience 2017 - Introduction in Oracle Big Data Cloud Service
 
WSO2Con EU 2016: An Effective Device Strategy to Accelerate your Business
WSO2Con EU 2016: An Effective Device Strategy to  Accelerate your BusinessWSO2Con EU 2016: An Effective Device Strategy to  Accelerate your Business
WSO2Con EU 2016: An Effective Device Strategy to Accelerate your Business
 
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, Ciena
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, CienaAutograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, Ciena
Autograph - Natural Signatures for Graph Modelling, Simon Brueckheimer, Ciena
 
Tim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentationTim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentation
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
 
Big Data Analytics: From Insights to Production
Big Data Analytics: From Insights to ProductionBig Data Analytics: From Insights to Production
Big Data Analytics: From Insights to Production
 
Operationalized Analytics in the Enterprise
Operationalized Analytics in the EnterpriseOperationalized Analytics in the Enterprise
Operationalized Analytics in the Enterprise
 

Viewers also liked

An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Elasticsearch 1.x Cluster Installation (VirtualBox)
Elasticsearch 1.x Cluster Installation (VirtualBox)Elasticsearch 1.x Cluster Installation (VirtualBox)
Elasticsearch 1.x Cluster Installation (VirtualBox)Amir Sedighi
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگآشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگAmir Sedighi
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 5 @ UTACM
Big Data and Machine Learning Workshop - Day 5 @ UTACMBig Data and Machine Learning Workshop - Day 5 @ UTACM
Big Data and Machine Learning Workshop - Day 5 @ UTACMAmir Sedighi
 
Distributed Data Processing Workshop - SBU
Distributed Data Processing Workshop - SBUDistributed Data Processing Workshop - SBU
Distributed Data Processing Workshop - SBUAmir Sedighi
 
Big Data and Machine Learning Workshop - Day 7 @ UTACM
Big Data and Machine Learning Workshop - Day 7 @ UTACM Big Data and Machine Learning Workshop - Day 7 @ UTACM
Big Data and Machine Learning Workshop - Day 7 @ UTACM Amir Sedighi
 
An Introduction to Elasticsearch for Beginners
An Introduction to Elasticsearch for BeginnersAn Introduction to Elasticsearch for Beginners
An Introduction to Elasticsearch for BeginnersAmir Sedighi
 
Big Data Processing Utilizing Open-source Technologies - May 2015
Big Data Processing Utilizing Open-source Technologies - May 2015Big Data Processing Utilizing Open-source Technologies - May 2015
Big Data Processing Utilizing Open-source Technologies - May 2015Amir Sedighi
 

Viewers also liked (11)

An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Elasticsearch 1.x Cluster Installation (VirtualBox)
Elasticsearch 1.x Cluster Installation (VirtualBox)Elasticsearch 1.x Cluster Installation (VirtualBox)
Elasticsearch 1.x Cluster Installation (VirtualBox)
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگآشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
آشنایی با داده‌های بزرگ و تکنیک‌های برنامه‌سازی برای پردازش داده‌های بزرگ
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
 
Big Data and Machine Learning Workshop - Day 5 @ UTACM
Big Data and Machine Learning Workshop - Day 5 @ UTACMBig Data and Machine Learning Workshop - Day 5 @ UTACM
Big Data and Machine Learning Workshop - Day 5 @ UTACM
 
Dark data
Dark dataDark data
Dark data
 
Distributed Data Processing Workshop - SBU
Distributed Data Processing Workshop - SBUDistributed Data Processing Workshop - SBU
Distributed Data Processing Workshop - SBU
 
Big Data and Machine Learning Workshop - Day 7 @ UTACM
Big Data and Machine Learning Workshop - Day 7 @ UTACM Big Data and Machine Learning Workshop - Day 7 @ UTACM
Big Data and Machine Learning Workshop - Day 7 @ UTACM
 
An Introduction to Elasticsearch for Beginners
An Introduction to Elasticsearch for BeginnersAn Introduction to Elasticsearch for Beginners
An Introduction to Elasticsearch for Beginners
 
Big Data Processing Utilizing Open-source Technologies - May 2015
Big Data Processing Utilizing Open-source Technologies - May 2015Big Data Processing Utilizing Open-source Technologies - May 2015
Big Data Processing Utilizing Open-source Technologies - May 2015
 

Similar to Case Studies on Big-Data Processing and Streaming - Iranian Java User Group

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
Ai and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-indiaAi and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-indiaCapgemini
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform Michael Ghen
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargBig Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargQA or the Highway
 
How Big Insights and Watson Explorer Raise New Abilities to HR Departments
How Big Insights and Watson Explorer Raise New Abilities to HR DepartmentsHow Big Insights and Watson Explorer Raise New Abilities to HR Departments
How Big Insights and Watson Explorer Raise New Abilities to HR DepartmentsCapgemini
 
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...Brett Sheppard
 
Big Data overview
Big Data overviewBig Data overview
Big Data overviewalexisroos
 
TechEvent DWH Modernization
TechEvent DWH ModernizationTechEvent DWH Modernization
TechEvent DWH ModernizationTrivadis
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data ScienceVMware Tanzu
 
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Sri Ambati
 
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...Agile Testing Alliance
 
Big Data – Is it a hype or for real?
 Big Data – Is it a hype or for real?  Big Data – Is it a hype or for real?
Big Data – Is it a hype or for real? Dirk Ortloff
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scaleBalvinder Hira
 
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellNadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellIT Arena
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jrJonathan Raspaud
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsImply
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Data Con LA
 
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...TigerGraph
 

Similar to Case Studies on Big-Data Processing and Streaming - Iranian Java User Group (20)

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
Ai and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-indiaAi and data migration as a service subhash bhat cwin18-india
Ai and data migration as a service subhash bhat cwin18-india
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
 
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya GargBig Data - Hadoop and MapReduce for QA and testing by Aditya Garg
Big Data - Hadoop and MapReduce for QA and testing by Aditya Garg
 
How Big Insights and Watson Explorer Raise New Abilities to HR Departments
How Big Insights and Watson Explorer Raise New Abilities to HR DepartmentsHow Big Insights and Watson Explorer Raise New Abilities to HR Departments
How Big Insights and Watson Explorer Raise New Abilities to HR Departments
 
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
How Comcast Turns Big Data into Real Time Operational Insights: Winter Olympi...
 
Big Data overview
Big Data overviewBig Data overview
Big Data overview
 
TechEvent DWH Modernization
TechEvent DWH ModernizationTechEvent DWH Modernization
TechEvent DWH Modernization
 
Driving Real Insights Through Data Science
Driving Real Insights Through Data ScienceDriving Real Insights Through Data Science
Driving Real Insights Through Data Science
 
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
 
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...Ataas2016 - Big data   hadoop and map reduce  - new age tools for aid to test...
Ataas2016 - Big data hadoop and map reduce - new age tools for aid to test...
 
Big Data – Is it a hype or for real?
 Big Data – Is it a hype or for real?  Big Data – Is it a hype or for real?
Big Data – Is it a hype or for real?
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
 
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellNadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
 
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
How to Build An AI Based Customer Data Platform: Learn the design patterns fo...
 

More from Amir Sedighi

Big Data and Machine Learning Workshop - Day 6 @ UTACM
Big Data and Machine Learning Workshop - Day 6 @ UTACMBig Data and Machine Learning Workshop - Day 6 @ UTACM
Big Data and Machine Learning Workshop - Day 6 @ UTACMAmir Sedighi
 
Big Data and Machine Learning Workshop - Day 4 @ UTACM
Big Data and Machine Learning Workshop - Day 4 @ UTACM Big Data and Machine Learning Workshop - Day 4 @ UTACM
Big Data and Machine Learning Workshop - Day 4 @ UTACM Amir Sedighi
 
Big Data and Machine Learning Workshop - Day 3 @ UTACM
Big Data and Machine Learning Workshop - Day 3 @ UTACMBig Data and Machine Learning Workshop - Day 3 @ UTACM
Big Data and Machine Learning Workshop - Day 3 @ UTACMAmir Sedighi
 
Big Data and Machine Learning Workshop - Day 2 @ UTACM
Big Data and Machine Learning Workshop - Day 2 @ UTACMBig Data and Machine Learning Workshop - Day 2 @ UTACM
Big Data and Machine Learning Workshop - Day 2 @ UTACMAmir Sedighi
 
Big Data and Machine Learning Workshop - Day 1 @ UTACM
Big Data and Machine Learning Workshop - Day 1 @ UTACMBig Data and Machine Learning Workshop - Day 1 @ UTACM
Big Data and Machine Learning Workshop - Day 1 @ UTACMAmir Sedighi
 
Two Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
Two Case Studies Big-Data and Machine Learning at Scale Solutions in IranTwo Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
Two Case Studies Big-Data and Machine Learning at Scale Solutions in IranAmir Sedighi
 
Opensource Frameworks and BigData Processing
Opensource Frameworks and BigData ProcessingOpensource Frameworks and BigData Processing
Opensource Frameworks and BigData ProcessingAmir Sedighi
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAmir Sedighi
 

More from Amir Sedighi (8)

Big Data and Machine Learning Workshop - Day 6 @ UTACM
Big Data and Machine Learning Workshop - Day 6 @ UTACMBig Data and Machine Learning Workshop - Day 6 @ UTACM
Big Data and Machine Learning Workshop - Day 6 @ UTACM
 
Big Data and Machine Learning Workshop - Day 4 @ UTACM
Big Data and Machine Learning Workshop - Day 4 @ UTACM Big Data and Machine Learning Workshop - Day 4 @ UTACM
Big Data and Machine Learning Workshop - Day 4 @ UTACM
 
Big Data and Machine Learning Workshop - Day 3 @ UTACM
Big Data and Machine Learning Workshop - Day 3 @ UTACMBig Data and Machine Learning Workshop - Day 3 @ UTACM
Big Data and Machine Learning Workshop - Day 3 @ UTACM
 
Big Data and Machine Learning Workshop - Day 2 @ UTACM
Big Data and Machine Learning Workshop - Day 2 @ UTACMBig Data and Machine Learning Workshop - Day 2 @ UTACM
Big Data and Machine Learning Workshop - Day 2 @ UTACM
 
Big Data and Machine Learning Workshop - Day 1 @ UTACM
Big Data and Machine Learning Workshop - Day 1 @ UTACMBig Data and Machine Learning Workshop - Day 1 @ UTACM
Big Data and Machine Learning Workshop - Day 1 @ UTACM
 
Two Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
Two Case Studies Big-Data and Machine Learning at Scale Solutions in IranTwo Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
Two Case Studies Big-Data and Machine Learning at Scale Solutions in Iran
 
Opensource Frameworks and BigData Processing
Opensource Frameworks and BigData ProcessingOpensource Frameworks and BigData Processing
Opensource Frameworks and BigData Processing
 
An introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoopAn introduction to Big-Data processing applying hadoop
An introduction to Big-Data processing applying hadoop
 

Recently uploaded

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 

Recently uploaded (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Case Studies on Big-Data Processing and Streaming - Iranian Java User Group

  • 1. Case Studies on Big-Data Processing and Data Streaming By: Amir Sedighi LinkedIn: http://linkedin.com/in/amirsedighi Twitter: @amirsedighi
  • 2. JUG - A.Sedighi - 2015 2 / 48 Background ● BS and MS degrees in Software Engineering ● Senior Software Engineer – +20 Years of Programming Experience ● Cross-platform Software Development – +4 Years of Big-Data Processing and Machine-Learning Experience ● Log Management and Forensic ● Big-Data Visualization ● Data Warehouse using Big-Data Technologies ● Recommender Systems ● Analytical Real-Time Search Engines ● Integrating Fedora Digital Library with HDFS ● Next Generation Event Processing ● Online Resume – http://linkedin.com/in/amirsedighi
  • 3. JUG - A.Sedighi - 2015 3 / 48 Outline ● An Introduction to Big-Data Processing ● Big-Data and Processing and Data Streaming – Data Processing 1. +TB Scale Data Warehouse 2. Analytical Real-Time Search Solution and BI 3. Scaleable Recommender System 4. Integrating Fedora Digital Library with HDFS – Stream and Event Processing 1. Super Fast Scaleable Log Management, Forensic and BI 2. Super Fast Scaleable Fraud Detection
  • 4. JUG - A.Sedighi - 2015 4 / 48 What Big-Data Is?
  • 5. JUG - A.Sedighi - 2015 5 / 48 ● Every 2 Days Human Create As Much Information As We Did Up To 2003 - Eric Schmidt
  • 6. JUG - A.Sedighi - 2015 6 / 48 Big-Data Characteristics ● Volume ● Variety ● Velocity
  • 7. JUG - A.Sedighi - 2015 7 / 48 You're a Part of It Every Day ● We've have the ability to store anything ● Companies and people are generating data like never before in history – Social Networks – Online Web Portals – Log Writers - Our Digital Footprint!
  • 8. JUG - A.Sedighi - 2015 8 / 48 You're a Part of It Every Day ● Big-Data is whatever people do in the digital world, including the foot print of what people, companies, devices and services do (Logs), including traditional tabular data stores.
  • 9. JUG - A.Sedighi - 2015 9 / 48 As a Manager still You're a Part of It ● “Over half of the business leaders today, realize they don't have access to the insights they need to do their job.” - IBM
  • 10. JUG - A.Sedighi - 2015 10 / 48 Vertical or Horizontal?
  • 11. JUG - A.Sedighi - 2015 11 / 48 Scale Up or Scale Out
  • 12. JUG - A.Sedighi - 2015 12 / 48 Linear Scalability
  • 13. JUG - A.Sedighi - 2015 13 / 48 Big-Data Processing Solutions
  • 14. JUG - A.Sedighi - 2015 14 / 48 Q: How To Be Linear Scaleable on Commodity Machines? A: MapReduce
  • 15. JUG - A.Sedighi - 2015 15 / 48 Q: How to store big data on commodity machines? A: Distributed File System
  • 16. JUG - A.Sedighi - 2015 16 / 48 Replication → Fault Tolerant Replication → Data Locality → Utilization
  • 17. JUG - A.Sedighi - 2015 17 / 48 Big-Data Processing, Most Popular Technologies ● Apache Hadoop Ecosystem ● NoSQL Databases – HBase – Cassandra – MongoDB – Neo4j ● Elasticsearch – Lucene – SolR ● Java
  • 18. JUG - A.Sedighi - 2015 18 / 48 +TB Scale Data Warehouse 1
  • 19. JUG - A.Sedighi - 2015 19 / 48 DW Solution ● SQL ● ETL – RDBMS – NoSQL – File System ● REST API
  • 20. JUG - A.Sedighi - 2015 20 / 48 REST Admin Panel
  • 21. JUG - A.Sedighi - 2015 21 / 48 Features ● Extendable Capacity for Data Warehousing ● Making Very Big Integrated Databases Based on Different Technologies/Schemas – DB2, Oracle, MS-SQL … – Different Schemas Such as HRMS, Banking, Sales... – Making Small Dense Integrated RDBMSs ● SQL Language Interface ● Linear Scalability
  • 22. JUG - A.Sedighi - 2015 22 / 48 Main Technologies and Frameworks ● Apache Hadoop – Sqoop – YARN/HDFS – Hive or Drill or Impala ● Microservices Architecture – Java 1.7 – Spring Boot
  • 23. JUG - A.Sedighi - 2015 23 / 48 Analytical Real-Time Scalable Search Solution and BI 2
  • 24. JUG - A.Sedighi - 2015 24 / 48 +TB Scale RT Searching ● Indexing Incoming Data on-the-fly ● Highly Scaleable and Reliable ● Simple or Complex Queries ● REST API ● Schema Agnostic ● Customizable GUI and BI
  • 25. JUG - A.Sedighi - 2015 25 / 48 Business Intelligence
  • 26. JUG - A.Sedighi - 2015 26 / 48 Rich GUI
  • 27. JUG - A.Sedighi - 2015 27 / 48 Main Technologies and Frameworks ● Elasticsearch – Apache Lucene – REST ● Kibana
  • 28. JUG - A.Sedighi - 2015 28 / 48 Scalable Recommender System 3
  • 29. JUG - A.Sedighi - 2015 29 / 48 Recommender System ● Value-added Service (Loyalty Services) ● Machine-Learning – Clustering Throw Thousands of Nodes ● Apache Mahout ● Super Fast
  • 30. JUG - A.Sedighi - 2015 30 / 48 How It Works?
  • 31. JUG - A.Sedighi - 2015 31 / 48 Technologies and Frameworks ● Microservices Architecture ● Java 1.6 ● Apache Mahout ● Redis
  • 32. Fedora Digital Library and HDFS Integration 4
  • 33. Migrating from Expensive Servers to Commodity Machines ● Making HDFS as Fedora Digital Library Storage – Research and Development – Hadoop 1.2, Later Hadoop YARN 2.2 – Integrating with SolR over HDFS ● Java 1.7 ● Fedora – Islandora – GSearch
  • 34. JUG - A.Sedighi - 2015 34 / 48 Data Streaming
  • 35. JUG - A.Sedighi - 2015 35 / 48 Big-Data Streaming, Most Popular Technologies ● Piping and Messaging – Kafka, Flume, FluentD and ZeroMQ ● Stream Processing – Storm, Samza and Spark ● Machine Learning – Machine Learning: MLLib and Mahout ● Persisting – NoSQL DBs – HDFS
  • 36. JUG - A.Sedighi - 2015 36 / 48 Log Management, Forensic and BI 1
  • 37. JUG - A.Sedighi - 2015 37 / 48 Log Management, Forensic and BI ● Every Digital Stuff Writes Things Into Log Files – Log Files Are Streams of Data – Log Files Are Messy – Log Files Come Very Fast, in an Un-Predictable Manner – Log Files Are About Everything within Your Business ● Log Files Are Full of Insight – Who Can Hold Them For a Reasonable Period of Time – Who Can Search Them Rapidly – Who Can Visualize Them Easily (BI)
  • 38. JUG - A.Sedighi - 2015 38 / 48 Network Topology LB Masters Data
  • 39. JUG - A.Sedighi - 2015 39 / 48 Main Technologies and Frameworks ● LogStash – Flume ● Elasticsearch ● Kibana
  • 40. JUG - A.Sedighi - 2015 40 / 48 Snapshot
  • 41. JUG - A.Sedighi - 2015 41 / 48 Fraud Detection 2
  • 42. JUG - A.Sedighi - 2015 42 / 48 Inputs & Outputs ● Inputs: One or multiple sources generate data continuously, in real time – Sensor Networks – Transaction Logs – Text Streams such as News – Network Traffic Analysis ● Outputs: Up-to-date Answers generated continuously or periodically
  • 43. JUG - A.Sedighi - 2015 43 / 48 Data Processing Transient Query – Issued once, then forgotten Persistent Data Stored until deleted by user or apps
  • 44. JUG - A.Sedighi - 2015 44 / 48 Stream Processing Transient Data – Deleted as Window Slides Forward Generated up-to-date answers as time goes on Persistent Queries TimeBased CountBased
  • 45. JUG - A.Sedighi - 2015 45 / 48 Features ● Scalability ● Real-Timing, (Only 1 Second delay at most) ● Super Fast Decision Making ● Implementing Complex Fraud Scenarios Aa Easy as Defining Queries ● Uniform Api For Processing Old or Early Events
  • 46. JUG - A.Sedighi - 2015 46 / 48 Main Technologies and Frameworks ● Java 1.7, Scala 2.11 ● Apache Flume ● Apache Kafka ● Apache Spark
  • 47. Where To Start? ● You need Big Amount of Data ● You need to change your mind – Rack Space and Number of Servers, IO and Process Limitations ● You need To Understand Fundamentals – Linux (Bash Script) – Java is a Most, Python works and Scala is an advantage – SQL and ETL – MapReduce, Resource Management and Serialization Frameworks – Apache Hadoop Ecosystem and Successors
  • 48. JUG - A.Sedighi - 2015 48 / 48 Thank You!, Question? http://slideshare.net/amirsedighi