SlideShare a Scribd company logo
1 of 63
Big Data Technologies You Didn’t Know
About
About Us
• Emerging technology firm focused on helping enterprises build breakthrough
software solutions
• Building software solutions powered by disruptive enterprise software trends
-Machine learning and data science
-Cyber-security
-Enterprise IOT
-Powered by Cloud and Mobile
• Bringing innovation from startups and academic institutions to the enterprise
• Award winning agencies: Inc 500, American Business Awards, International
Business Awards
• Big data technologies you didn’t know about
• Apache Flink
• Apache Samza
• Google Cloud Data Flow
• StreamSets
• Tensor Flow
• Apache NiFi
• Druid
• LinkedIn WhereHows
• Microsoft Cognitive Services
Agenda
Two Goals…
Think Beyond Traditional Big Data Stacks
Learn from Companies Building Big Data Pipelines at Scale
Big Data pipelines in the enterprise
Areas of a Big Data Pipeline
Big Data
Pipeline
Data
Processing
Stream Data
Ingestion
Data
transformati
ons
Cognitive
Computing
Machine
Learning
High
Performance
Data Access
Data Processing
Technology Stacks You Know
But You Probably Didn’t Know About….
Apache Flink
• Apache Flink, like Apache Hadoop and Apache Spark, is a community-
driven open source framework for distributed Big Data Analytics.
• Apache Flink engine exploits data streaming and in-memory processing
and iteration operators to improve performance.
• Apache Flink has its origins in a research project called Stratosphere of
which the idea was conceived in 2008 by professor Volker Markl from the
Technische Universität Berlin in Germany.
• In German, Flink means agile or swift. Flink joined the Apache incubator
in April 2014 and graduated as an Apache Top Level Project (TLP) in
December 2014.
Apache Flink
• Declarativity
• Query optimization
• Efficient parallel in-
memory and out-of-
core algorithms
• Massive scale-out
• User Defined
Functions
• Complex data types
• Schema on read
• Streaming
• Iterations
• Advanced
Dataflows
• General APIs
Draws on concepts from
MPP Database
Technology
Draws on concepts from
Hadoop MapReduce
TechnologyAdd
Apache Flink
Apache Flink: An Example
case class Word (word: String, frequency: Int)
val lines: DataStream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS))
.groupBy("word").sum("frequency")
.print()
val lines: DataSet[String] = env.readTextFile(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.groupBy("word").sum("frequency")
.print()
DataSet API (batch):
DataStream API (streaming):
Stream Data Processing
Technology Stacks You Know
But You Probably Didn’t Know About….
Apache Samza
• Created by LinkedIn to address extend the capabilities of Apache Kafka
• Simple API
• Managed state
• Fault Tolerant
• Durable messaging
• Scalable
• Extensible
• Processor Isolation
Apache Samza: Overview
• Samza code runs as a Yarn job
• You implement the StreamTask
interface, which defines a
process() call.
• StreamTask runs inside a task
instance, which itself is inside a
Yarn container.
Apache Samza: Operators
• Filter records matching condition
• Map record ⇒ func(record)
• Join two/more datasets by key
• Group records with the same value in field
• Aggregate records within the same group
• Pipe job 1’s output ⇒ job 2’s input
• MapReduce assumes fixed dataset.
Can we adapt this to unbounded streams?
Apache Samza: Sample Code
Data Transformation
Technology Stacks You Know
But You Probably Didn’t Know About….
Google Cloud Data Flow
• Native Google Cloud data processing service
• Simple programming model for batch and
streamed data processing tasks
• Provides a data flow managed service to
control the execution of data processing
jobs
• Data processing jobs can be authored using
the Data Flow SDKs (Apache Beam)
Google Cloud Data Flow : Details
• A pipeline encapsulates an entire series of
computations that accepts some input data
from external sources, transforms that
data produces some output data.
• A PCollection abstracts a data unit in a
pipeline
• Sources and Sink abstract read and write
operations in a pipeline
• Google Data Flow provides management,
monitoring and security capabilities in
data pipelines
Google Cloud Data Flow is Based on Apache Beam
• 1. Portable - You can use the same code with
different runners (abstraction) and backends on
premise, in the cloud, or locally
• 2. Unified - Same unified model for batch and
stream processing
• 3. Advanced features - Event windowing,
triggering, watermarking, lateless, etc.
• 4. Extensible model and SDK - Extensible API;
can define custom sources to read and write in
parallel
But You Probably Didn’t Know About….
StreamSets Data Collector
• Data processing platform optimized for data
in motion
• Visual data flow authoring model
• Open source distribution model
• On-premise and cloud distributions
• Rich monitoring and management
interfaces
StreamSets Data Collector: Details
• Data collectors streams and process data
in real time using data pipelines
• A pipeline describes a data flow from
origin to destination
• A pipeline is composed of origins,
destinations and processors
• Extensibility model based on JavaScript
and Jython
• The lifecycle of a data collector can be
controlled via the administration console
Machine Learning
Technology Stacks You Know
But You Probably Didn’t Know About….
TensorFlow
• Second generation Machine Learning system, followed by DistBelief
• TensorFlow grew out of a project at Google, called Google Brain, aimed at
applying various kinds of neural network machine learning to products and
services across the company.
• An open source software library for numerical computation using data flow graphs
• Used in following projects at Google
1. DeepDream
2. RankBrain
3. Smart Reply
And many more..
TensorFlow: Details
• Data flow graphs describe mathematical computation
with a directed graph of nodes & edges
• Nodes in the graph represent mathematical
operations.
• Edges represent the multidimensional data arrays
(tensors) communicated between them.
• Edges describe the input/output relationships between
nodes.
• The flow of tensors through the graph is where
TensorFlow gets its name.
TensorFlow
• Tensor
• Variable
• Operation
• Session
• Placeholder
• TensorBoard
Fast Data Access
Technology Stacks You Know
But You Probably Didn’t Know About….
Druid
• Druid was started in 2011
• ‣ Power interactive data applications
• ‣ Multi-tenancy: lots of concurrent users
• ‣ Scalability: trillions events/day, sub-second queries
• ‣ Real-time analysis
• Key Features
• LOW LATENCY INGESTION
• FAST AGGREGATIONS
• ARBITRARY SLICE-N-DICE CAPABILITIES
• HIGHLY AVAILABLE
• APPROXIMATE & EXACT CALCULATIONS
Druid: Details
• Realtime Node
• Historical Node
• Broker Node
• Coordinator Node
• Indexing Service
Druid: Details
• Realtime Node
• Historical Node
• Broker Node
• Coordinator Node
• Indexing Service
• JSON based query language
Low Latency Data Flows
Technology Stacks You Know
But You Probably Didn’t Know About….
Apache NiFi
• Powerful and reliable system to process and
distribute data
• Directed graphs of data routing and
transformation
• Web-based User Interface for creating,
monitoring, & controlling data flows
• Highly configurable - modify data flow at
runtime, dynamically prioritize data
• Data Provenance tracks data through entire
system
• Easily extensible through development of
custom components
Apache NiFi: Architecture
Apache NiFi: Concepts
• FlowFile
• Unit of data moving through the system
• Content + Attributes (key/value pairs)
• Processor
• Performs the work, can access FlowFiles
• Connection
• Links between processors
• Queues that can be dynamically
prioritized
• Process Group
• Set of processors and their connections
• Receive data via input ports, send data
via output ports
Data Discovery
Technology Stacks You Know
But You Probably Didn’t Know About….
WhereHows
Linkedin WhereHows
• Where is my data? How did it get there?
• Enterprise data catalog
• Metadata search
• Collaboration
• Data lineage analysis
• Connectivity to many data sources and
ETL tools
• Powering Linkedin data discovery layer
Linkedin WhereHows: Architecture
• Web interface for data discovery
• API enabled
• Backend server that controls the metadata
crawling and integration with other
systems
Linkedin WhereHows: Data Lineage
• Collects metadata from ETL platforms and
scripts
• Sources include
• Pig
• MapReduce
• Informatica
• Teradata
• Visualizes the lineage information
associated with a data source
Cognitive Computing
Technology Stacks You Know
But You Probably Didn’t Know About….
Microsoft Cognitive Services
• Based on Project Oxford and Bing
• Offers 22 cognitive computing APIs
• Main categories include:
• Vision
• Speech
• Language
• Knowledge
• Search
• Integrated with Cortana Intelligence Suite
Microsoft Cognitive Services
Microsoft Cognitive Services: Developer Experience
• 22 different REST APIs
that abstract cognitive
capabilities
• SDKs for Windows, IOS,
Android and Python
• Open source
Summary
• The big data ecosystem is constantly evolving
• There are a lot of relevant new technologies beyond the traditional Hadoop-Spark stacks
• Big internet companies are leading innovation in the space
Thanks
http://Tellago.com
Info@Tellago.com

More Related Content

What's hot

Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Guido Schmutz
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
Data Pipelines With Streamsets
Data Pipelines With Streamsets Data Pipelines With Streamsets
Data Pipelines With Streamsets Jowanza Joseph
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsPat Patterson
 
a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...DataWorks Summit
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)Jun Rao
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedInAmy W. Tang
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentHostedbyConfluent
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectPrecisely
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasDataWorks Summit
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit
 
Continus sql with sql stream builder
Continus sql with sql stream builderContinus sql with sql stream builder
Continus sql with sql stream builderTimothy Spann
 
Sidecars and a Microservices Mesh
Sidecars and a Microservices MeshSidecars and a Microservices Mesh
Sidecars and a Microservices MeshRed Hat Developers
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platformhadooparchbook
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 

What's hot (20)

Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?Big Data - in the cloud or rather on-premises?
Big Data - in the cloud or rather on-premises?
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
Data Pipelines With Streamsets
Data Pipelines With Streamsets Data Pipelines With Streamsets
Data Pipelines With Streamsets
 
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, SparkReactive Fast Data & the Data Lake with Akka, Kafka, Spark
Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...a Real-time Processing System based on Spark streaming int he field of Teleco...
a Real-time Processing System based on Spark streaming int he field of Teleco...
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
Continus sql with sql stream builder
Continus sql with sql stream builderContinus sql with sql stream builder
Continus sql with sql stream builder
 
Sidecars and a Microservices Mesh
Sidecars and a Microservices MeshSidecars and a Microservices Mesh
Sidecars and a Microservices Mesh
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 

Viewers also liked

Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing FrameworksSirKetchup
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastoreKishore Gopalakrishna
 
Recent advances in ablation technology 2014 slideshare
Recent advances in ablation technology 2014 slideshareRecent advances in ablation technology 2014 slideshare
Recent advances in ablation technology 2014 slideshareTroy Watts
 
modern security risks for big data and mobile applications
modern security risks for big data and mobile applicationsmodern security risks for big data and mobile applications
modern security risks for big data and mobile applicationsTrivadis
 
VO Course 10: Big data challenges in astronomy
VO Course 10: Big data challenges in astronomyVO Course 10: Big data challenges in astronomy
VO Course 10: Big data challenges in astronomyJoint ALMA Observatory
 
A Practical Guide for Selecting an Enterprise Messaging Platforms
A Practical Guide for Selecting an Enterprise Messaging PlatformsA Practical Guide for Selecting an Enterprise Messaging Platforms
A Practical Guide for Selecting an Enterprise Messaging PlatformsJesus Rodriguez
 
Microservices the Good Bad and the Ugly
Microservices the Good Bad and the UglyMicroservices the Good Bad and the Ugly
Microservices the Good Bad and the UglyAdrian Cockcroft
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data LakeWaterlineData
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidDataWorks Summit
 
Addressing Big Data Security Challenges: The Right Tools for Smart Protection...
Addressing Big Data Security Challenges: The Right Tools for Smart Protection...Addressing Big Data Security Challenges: The Right Tools for Smart Protection...
Addressing Big Data Security Challenges: The Right Tools for Smart Protection...Information Security Awareness Group
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsSnapLogic
 
Microservices in the Enterprise: A Research Study and Reference Architecture
Microservices in the Enterprise: A Research Study and Reference ArchitectureMicroservices in the Enterprise: A Research Study and Reference Architecture
Microservices in the Enterprise: A Research Study and Reference ArchitectureJesus Rodriguez
 
Big data security challenges and recommendations!
Big data security challenges and recommendations!Big data security challenges and recommendations!
Big data security challenges and recommendations!cisoplatform
 
Big data and cyber security legal risks and challenges
Big data and cyber security legal risks and challengesBig data and cyber security legal risks and challenges
Big data and cyber security legal risks and challengesKapil Mehrotra
 
Growth Hacking - 10 Key Checklist
Growth Hacking - 10 Key Checklist Growth Hacking - 10 Key Checklist
Growth Hacking - 10 Key Checklist Bryan Ferguson
 
Using the Actor Model with Domain-Driven Design (DDD) in Reactive Systems - w...
Using the Actor Model with Domain-Driven Design (DDD) in Reactive Systems - w...Using the Actor Model with Domain-Driven Design (DDD) in Reactive Systems - w...
Using the Actor Model with Domain-Driven Design (DDD) in Reactive Systems - w...Lightbend
 
WSO2Con USA 2017: Geospatial Big Data – Location Intelligence in Digital Tran...
WSO2Con USA 2017: Geospatial Big Data – Location Intelligence in Digital Tran...WSO2Con USA 2017: Geospatial Big Data – Location Intelligence in Digital Tran...
WSO2Con USA 2017: Geospatial Big Data – Location Intelligence in Digital Tran...WSO2
 

Viewers also liked (20)

Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
 
Recent advances in ablation technology 2014 slideshare
Recent advances in ablation technology 2014 slideshareRecent advances in ablation technology 2014 slideshare
Recent advances in ablation technology 2014 slideshare
 
modern security risks for big data and mobile applications
modern security risks for big data and mobile applicationsmodern security risks for big data and mobile applications
modern security risks for big data and mobile applications
 
VO Course 10: Big data challenges in astronomy
VO Course 10: Big data challenges in astronomyVO Course 10: Big data challenges in astronomy
VO Course 10: Big data challenges in astronomy
 
A Practical Guide for Selecting an Enterprise Messaging Platforms
A Practical Guide for Selecting an Enterprise Messaging PlatformsA Practical Guide for Selecting an Enterprise Messaging Platforms
A Practical Guide for Selecting an Enterprise Messaging Platforms
 
Microservices the Good Bad and the Ugly
Microservices the Good Bad and the UglyMicroservices the Good Bad and the Ugly
Microservices the Good Bad and the Ugly
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data Lake
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Addressing Big Data Security Challenges: The Right Tools for Smart Protection...
Addressing Big Data Security Challenges: The Right Tools for Smart Protection...Addressing Big Data Security Challenges: The Right Tools for Smart Protection...
Addressing Big Data Security Challenges: The Right Tools for Smart Protection...
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Microservices in the Enterprise: A Research Study and Reference Architecture
Microservices in the Enterprise: A Research Study and Reference ArchitectureMicroservices in the Enterprise: A Research Study and Reference Architecture
Microservices in the Enterprise: A Research Study and Reference Architecture
 
Bots in the Enterprise
Bots in the Enterprise Bots in the Enterprise
Bots in the Enterprise
 
Big data security challenges and recommendations!
Big data security challenges and recommendations!Big data security challenges and recommendations!
Big data security challenges and recommendations!
 
Big data and cyber security legal risks and challenges
Big data and cyber security legal risks and challengesBig data and cyber security legal risks and challenges
Big data and cyber security legal risks and challenges
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Growth Hacking - 10 Key Checklist
Growth Hacking - 10 Key Checklist Growth Hacking - 10 Key Checklist
Growth Hacking - 10 Key Checklist
 
Using the Actor Model with Domain-Driven Design (DDD) in Reactive Systems - w...
Using the Actor Model with Domain-Driven Design (DDD) in Reactive Systems - w...Using the Actor Model with Domain-Driven Design (DDD) in Reactive Systems - w...
Using the Actor Model with Domain-Driven Design (DDD) in Reactive Systems - w...
 
WSO2Con USA 2017: Geospatial Big Data – Location Intelligence in Digital Tran...
WSO2Con USA 2017: Geospatial Big Data – Location Intelligence in Digital Tran...WSO2Con USA 2017: Geospatial Big Data – Location Intelligence in Digital Tran...
WSO2Con USA 2017: Geospatial Big Data – Location Intelligence in Digital Tran...
 

Similar to 10 Big Data Technologies you Didn't Know About

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Perficient, Inc.
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewRiccardo Zamana
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumUltralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumData Driven Innovation
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0Amr Kamel Deklel
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 

Similar to 10 Big Data Technologies you Didn't Know About (20)

Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Data saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overviewData saturday malta - ADX Azure Data Explorer overview
Data saturday malta - ADX Azure Data Explorer overview
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - OptumUltralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
Ultralight data movement for IoT with SDC Edge. Guglielmo Iozzia - Optum
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Big data analytics and machine intelligence v5.0
Big data analytics and machine intelligence   v5.0Big data analytics and machine intelligence   v5.0
Big data analytics and machine intelligence v5.0
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Apache drill
Apache drillApache drill
Apache drill
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 

More from Jesus Rodriguez

The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesJesus Rodriguez
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxJesus Rodriguez
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketJesus Rodriguez
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersJesus Rodriguez
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Jesus Rodriguez
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesJesus Rodriguez
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFiJesus Rodriguez
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Jesus Rodriguez
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi AnalyticsJesus Rodriguez
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesJesus Rodriguez
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revJesus Rodriguez
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsJesus Rodriguez
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesJesus Rodriguez
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesJesus Rodriguez
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningJesus Rodriguez
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceJesus Rodriguez
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven revJesus Rodriguez
 
Implementing Machine Learning in the Real World
Implementing Machine Learning in the Real WorldImplementing Machine Learning in the Real World
Implementing Machine Learning in the Real WorldJesus Rodriguez
 

More from Jesus Rodriguez (20)

The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-Primitives
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto Market
 
MEV Deep Dive .pptx
MEV Deep Dive .pptxMEV Deep Dive .pptx
MEV Deep Dive .pptx
 
Quant in Crypto Land
Quant in Crypto LandQuant in Crypto Land
Quant in Crypto Land
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the Numbers
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating Strategies
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFi
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and Challenges
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions rev
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain Indicators
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for Cryptocurrencies
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About Cryptocurrencies
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep Learning
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data Science
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven rev
 
Implementing Machine Learning in the Real World
Implementing Machine Learning in the Real WorldImplementing Machine Learning in the Real World
Implementing Machine Learning in the Real World
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

10 Big Data Technologies you Didn't Know About

  • 1. Big Data Technologies You Didn’t Know About
  • 2. About Us • Emerging technology firm focused on helping enterprises build breakthrough software solutions • Building software solutions powered by disruptive enterprise software trends -Machine learning and data science -Cyber-security -Enterprise IOT -Powered by Cloud and Mobile • Bringing innovation from startups and academic institutions to the enterprise • Award winning agencies: Inc 500, American Business Awards, International Business Awards
  • 3. • Big data technologies you didn’t know about • Apache Flink • Apache Samza • Google Cloud Data Flow • StreamSets • Tensor Flow • Apache NiFi • Druid • LinkedIn WhereHows • Microsoft Cognitive Services Agenda
  • 5. Think Beyond Traditional Big Data Stacks
  • 6. Learn from Companies Building Big Data Pipelines at Scale
  • 7. Big Data pipelines in the enterprise
  • 8. Areas of a Big Data Pipeline Big Data Pipeline Data Processing Stream Data Ingestion Data transformati ons Cognitive Computing Machine Learning High Performance Data Access
  • 11. But You Probably Didn’t Know About….
  • 12. Apache Flink • Apache Flink, like Apache Hadoop and Apache Spark, is a community- driven open source framework for distributed Big Data Analytics. • Apache Flink engine exploits data streaming and in-memory processing and iteration operators to improve performance. • Apache Flink has its origins in a research project called Stratosphere of which the idea was conceived in 2008 by professor Volker Markl from the Technische Universität Berlin in Germany. • In German, Flink means agile or swift. Flink joined the Apache incubator in April 2014 and graduated as an Apache Top Level Project (TLP) in December 2014.
  • 13. Apache Flink • Declarativity • Query optimization • Efficient parallel in- memory and out-of- core algorithms • Massive scale-out • User Defined Functions • Complex data types • Schema on read • Streaming • Iterations • Advanced Dataflows • General APIs Draws on concepts from MPP Database Technology Draws on concepts from Hadoop MapReduce TechnologyAdd
  • 15. Apache Flink: An Example case class Word (word: String, frequency: Int) val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .groupBy("word").sum("frequency") .print() val lines: DataSet[String] = env.readTextFile(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() DataSet API (batch): DataStream API (streaming):
  • 18. But You Probably Didn’t Know About….
  • 19. Apache Samza • Created by LinkedIn to address extend the capabilities of Apache Kafka • Simple API • Managed state • Fault Tolerant • Durable messaging • Scalable • Extensible • Processor Isolation
  • 20. Apache Samza: Overview • Samza code runs as a Yarn job • You implement the StreamTask interface, which defines a process() call. • StreamTask runs inside a task instance, which itself is inside a Yarn container.
  • 21. Apache Samza: Operators • Filter records matching condition • Map record ⇒ func(record) • Join two/more datasets by key • Group records with the same value in field • Aggregate records within the same group • Pipe job 1’s output ⇒ job 2’s input • MapReduce assumes fixed dataset. Can we adapt this to unbounded streams?
  • 25. But You Probably Didn’t Know About….
  • 26. Google Cloud Data Flow • Native Google Cloud data processing service • Simple programming model for batch and streamed data processing tasks • Provides a data flow managed service to control the execution of data processing jobs • Data processing jobs can be authored using the Data Flow SDKs (Apache Beam)
  • 27. Google Cloud Data Flow : Details • A pipeline encapsulates an entire series of computations that accepts some input data from external sources, transforms that data produces some output data. • A PCollection abstracts a data unit in a pipeline • Sources and Sink abstract read and write operations in a pipeline • Google Data Flow provides management, monitoring and security capabilities in data pipelines
  • 28. Google Cloud Data Flow is Based on Apache Beam • 1. Portable - You can use the same code with different runners (abstraction) and backends on premise, in the cloud, or locally • 2. Unified - Same unified model for batch and stream processing • 3. Advanced features - Event windowing, triggering, watermarking, lateless, etc. • 4. Extensible model and SDK - Extensible API; can define custom sources to read and write in parallel
  • 29. But You Probably Didn’t Know About….
  • 30. StreamSets Data Collector • Data processing platform optimized for data in motion • Visual data flow authoring model • Open source distribution model • On-premise and cloud distributions • Rich monitoring and management interfaces
  • 31. StreamSets Data Collector: Details • Data collectors streams and process data in real time using data pipelines • A pipeline describes a data flow from origin to destination • A pipeline is composed of origins, destinations and processors • Extensibility model based on JavaScript and Jython • The lifecycle of a data collector can be controlled via the administration console
  • 34. But You Probably Didn’t Know About….
  • 35. TensorFlow • Second generation Machine Learning system, followed by DistBelief • TensorFlow grew out of a project at Google, called Google Brain, aimed at applying various kinds of neural network machine learning to products and services across the company. • An open source software library for numerical computation using data flow graphs • Used in following projects at Google 1. DeepDream 2. RankBrain 3. Smart Reply And many more..
  • 36. TensorFlow: Details • Data flow graphs describe mathematical computation with a directed graph of nodes & edges • Nodes in the graph represent mathematical operations. • Edges represent the multidimensional data arrays (tensors) communicated between them. • Edges describe the input/output relationships between nodes. • The flow of tensors through the graph is where TensorFlow gets its name.
  • 37. TensorFlow • Tensor • Variable • Operation • Session • Placeholder • TensorBoard
  • 40. But You Probably Didn’t Know About….
  • 41. Druid • Druid was started in 2011 • ‣ Power interactive data applications • ‣ Multi-tenancy: lots of concurrent users • ‣ Scalability: trillions events/day, sub-second queries • ‣ Real-time analysis • Key Features • LOW LATENCY INGESTION • FAST AGGREGATIONS • ARBITRARY SLICE-N-DICE CAPABILITIES • HIGHLY AVAILABLE • APPROXIMATE & EXACT CALCULATIONS
  • 42. Druid: Details • Realtime Node • Historical Node • Broker Node • Coordinator Node • Indexing Service
  • 43. Druid: Details • Realtime Node • Historical Node • Broker Node • Coordinator Node • Indexing Service • JSON based query language
  • 46. But You Probably Didn’t Know About….
  • 47. Apache NiFi • Powerful and reliable system to process and distribute data • Directed graphs of data routing and transformation • Web-based User Interface for creating, monitoring, & controlling data flows • Highly configurable - modify data flow at runtime, dynamically prioritize data • Data Provenance tracks data through entire system • Easily extensible through development of custom components
  • 49. Apache NiFi: Concepts • FlowFile • Unit of data moving through the system • Content + Attributes (key/value pairs) • Processor • Performs the work, can access FlowFiles • Connection • Links between processors • Queues that can be dynamically prioritized • Process Group • Set of processors and their connections • Receive data via input ports, send data via output ports
  • 52. But You Probably Didn’t Know About…. WhereHows
  • 53. Linkedin WhereHows • Where is my data? How did it get there? • Enterprise data catalog • Metadata search • Collaboration • Data lineage analysis • Connectivity to many data sources and ETL tools • Powering Linkedin data discovery layer
  • 54. Linkedin WhereHows: Architecture • Web interface for data discovery • API enabled • Backend server that controls the metadata crawling and integration with other systems
  • 55. Linkedin WhereHows: Data Lineage • Collects metadata from ETL platforms and scripts • Sources include • Pig • MapReduce • Informatica • Teradata • Visualizes the lineage information associated with a data source
  • 58. But You Probably Didn’t Know About….
  • 59. Microsoft Cognitive Services • Based on Project Oxford and Bing • Offers 22 cognitive computing APIs • Main categories include: • Vision • Speech • Language • Knowledge • Search • Integrated with Cortana Intelligence Suite
  • 61. Microsoft Cognitive Services: Developer Experience • 22 different REST APIs that abstract cognitive capabilities • SDKs for Windows, IOS, Android and Python • Open source
  • 62. Summary • The big data ecosystem is constantly evolving • There are a lot of relevant new technologies beyond the traditional Hadoop-Spark stacks • Big internet companies are leading innovation in the space