SlideShare a Scribd company logo
1 of 31
© 2015 MapR Technologies© 2015 MapR Technologies
Exploring Enterprise Networks with Familiar BI Tools
© 2015 MapR Technologies
On the Menu
• Discovery: why Hadoop + BI tools for analyzing networks?
• Network analysis in a BI context
• Apache Drill
• Connecting BI tools to network data
• Practical examples with Drill and BI
– Querying packets with Tableau
– Troubleshooting with SAP Lumira
– Gaining insight into customer experience across multiple sources
– Using built-in Drill features for faster analysis
• Summary, conclusions, more resources
© 2015 MapR Technologies
Topics not covered in detail…
• Packet capture architectures
• Ways to capture packets effectively
• Large-scale packet processing – others have done this
• Comparison of BI tools
• Survey of the best SQL-on-Hadoop technology
© 2015 MapR Technologies
There’s a lot happening in your network…
• Packets, logs, interconnections
• Many layers (L1-L7), “L8”
• Network data is multi-faceted…
– It’s serialized and highly structured
– It facilitates communication between heterogeneous devices via
common protocols
– But it’s not structured to be stored and analyzed
– The application often doesn’t care
– Consequently, specialized tooling and software is required
© 2015 MapR Technologies
Why Hadoop + BI tools?
• What does Hadoop enable that makes it a powerful tool for
network analytics?
• What’s new that wasn’t previous possible/desirable?
• How does it augment existing solutions?
• It’s many things:
– New ways of accessing semi-structured data from the network
– Offloading of existing data warehouses and tools
– Combining, joining, blending network captures with other sources
– Many network tools cannot answer questions about your business and
customers
– You can use SQL to get a lot of the answers you need
© 2015 MapR Technologies
New Data Sources Unlock New Insights & Apps
Existing structured data
• Well-defined and well-
understood schema
– OLTP data
– Data warehouse data
– End user data stores (e.g.,
Excel)
New multi-structured data
• Typically un-modeled,
different in format
– Network data
– Clickstream data
– Sensor data
– Rich media (e.g., audio, video)
– Documents
… both types are needed today for deeper insights
© 2015 MapR Technologies
1980 2000 20101990 2020
Fixed schema
DBA controls structure
Dynamic / Flexible schema
Application controls structure
NON-RELATIONAL DATASTORESRELATIONAL DATABASES
GBs-TBs TBs-PBsVolume
Database
Network data, like other data, is increasingly Stored in Non-
Relational Datastores
Structure
Development
Structured Structured, semi-structured and unstructured
Planned (release cycle = months-years) Iterative (release cycle = days-weeks)
© 2015 MapR Technologies
Apache Drill Brings Flexibility & Performance
Access to any data type, any data source
• Relational
• Nested data
• Schema-less
Rapid time to insights
• Query data in-situ
• No Schemas required
• Easy to get started
Integration with existing tools
• ANSI SQL
• BI tool integration
Scale in all dimensions
• TB-PB of scale
• 1000’s of users
• 1000’s of nodes
Granular Security
• Authentication
• Row/column level controls
• De-centralized
© 2015 MapR Technologies
Granular security permissions through Drill views
Name City State Credit Card #
Dave San Jose CA 1374-7914-3865-4817
John Boulder CO 1374-9735-1794-9711
Raw File (/raw/cards.csv)
Owner
Admins
Permission
Admins
Business Analyst Data Scientist
Name City State Credit Card #
Dave San Jose CA 1374-1111-1111-1111
John Boulder CO 1374-1111-1111-1111
Data Scientist View (/views/maskedcards.csv)
Not a physical data copy
Name City State
Dave San Jose CA
John Boulder CO
Business Analyst View
Owner
Admins
Permission
Business
Analysts
Owner
Admins
Permission
Data
Scientists
© 2015 MapR Technologies
Self-Service Data Exploration
Direct access to Hadoop data from familiar BI / Analytics tools- ANSI SQL compatible
Ad-hoc
Reporting
Queries
Raw Data
Exploration
Day Zero
queries
…
© 2015 MapR Technologies
Drill is a Distributed SQL query engine
drillbit
DataNode/Regi
onServer
drillbit
DataNode/Regi
onServer
drillbit
DataNode/Regi
onServer
ZooKeeper
ZooKeeper
ZooKeeper
…
 Scale out
 Columnar and Vectorized execution
 Optimistic and pipelined execution (no MR, Spark, Tez)
 Late binding
 Extensible
© 2015 MapR Technologies
- Sub-directory
- HBase namespace
- Hive database
Run SQL on Captures Directly
SELECT * FROM dfs.router1.`captures.json`
Workspace
- Pathnames
- Hive table
- HBase table
Table
- DFS (Text, Parquet, JSON)
- HBase/MapR-DB
- Hive Metastore/HCatalog
- Easy API to go beyond Hadoop
Storage plugin instance
© 2015 MapR Technologies
Network Analytics in a BI Context
• Getting results from BI tools requires SQL expertise
– Analytic techniques, visualizations, dashboarding
– Proprietary information about your operations
– Making sense of sources quickly
• New SQL-on-Hadoop (like Drill) technologies enable leveraging
this:
– To find new areas to gain value from combining your own proprietary
data with network sources
– Augment the analysis you’re doing now via use cases for packet data
you’re already storing in Hadoop
– Use data in real-time that’s too large to fit into memory and/or hits BI
tool limitations for analysis directly
© 2015 MapR Technologies
Hadoop Packet Processing Ecosystem
• Translating to various formats
– JSON
– CSV
– Parquet, others
• Packet ingestion
– Flume tcpdump source
– Direct from hardware vendors
• Northbound APIs
– Openstack and opendaylight
• More open source tools
– Packet processing in Pig, etc.
© 2015 MapR Technologies
Network Data Sources
• Data sources in the network are growing, changing
– Existing: tcpdump, SPAN, pcap
– New and more: SDN, NFV, REST APIs
• Often not suitable for analysis directly
– Requires building a schema
– ETL
– Structure is changing and evolving  ongoing management
– Large size, too big for memory
© 2015 MapR Technologies
REST APIs and JSON
• Self-describing data is common with REST APIs
– JSON
• Northbound APIs on almost everything in the network
– Enables access to many operational views
– But requires development work to pull it together
• SQL queries directly on the data is difficult
• Requires transformations, scripting, parsing
© 2015 MapR Technologies
View Drillbits information
in the cluster
© 2015 MapR Technologies
Manage storage plugin
instances through Web UI
© 2015 MapR Technologies
Monitor and
manage Drill queries
© 2015 MapR Technologies
See details of the query
© 2015 MapR Technologies
SAP Lumira and Wireshark Example -- Scenario
• Overview:
– Sensor data in JSON format being gathered multiple times daily from
remote locations
– Done over an IP network, each sensor has an IP address
• Problem
– One sensor is experiencing reading failures
– Network connectivity issues are suspected
• Solution Approach
– Take packet captures where we are reading sensors (central location) –
CSV-formatted Wireshark file
– Observe whether there are many TCP retransmissions happening
between the source and destination
– Ultimately, determine if the network is the problem and take action
© 2015 MapR Technologies
© 2015 MapR Technologies
Summary
• Using Drill from SAP Lumira, and the JDBC driver
– We compared data across multiple sources
• Notice we didn’t do any ETL
– Or define any schema for the network data
• Using existing ANSI SQL knowledge to query the data without
transformations
– Not just on the network data, but combined with other sources
• Self-service
© 2015 MapR Technologies
© 2015 MapR Technologies
Network Routing, OpenStack, JSON
• Link-state routing protocols (OSPF, IS-IS, Trill)
– Each participating node knows the topology of the entire network
– A dump of the database shows all nodes and adjacencies
– Physical and logical topology
– Other information (MPLS, etc.)
• OpenStack: pull networks, subnets, ports via REST API
– Use Drill Explorer to build a view
– Combine the data with device or customer information
• Enables visualizing the entire network quickly
© 2015 MapR Technologies
OpenStack Networking APIs Example
• JSON formatted responses
• Run queries without any data preparation
• Use of FLATTEN() for arbitrary maps
© 2015 MapR Technologies
FLATTEN()
• FLATTEN() is useful for exploration of data that is repeated
• Used on arrays
• Columns are repeated as necessary to maintain association with
each element of the array
• Example:
“host routes”: [
{
“destination” : “0.0.0.0/0”,
“nexthop”: “10.10.10.1”
},
{
“destination” : “192.168.10.0/24”,
“nexthop”: “192.168.0.1”,
},
…
]
© 2015 MapR Technologies
© 2015 MapR Technologies
TCP Round-Trip Times Example
• TCP RTT can affect customer experience in many ways
– Not just loading pages
– Also interactive, AJAX, forms, etc.
• Much of this can be calculated with other tools, then visualized
– Complex to calculate on your own
• Only a part of overall performance story, but helpful
– Example: switching network providers, adding caches or optimizers
© 2015 MapR Technologies
© 2015 MapR Technologies
Summary and Conclusions
• New SQL-on-Hadoop technologies enable network analysis in a BI
context
– Less time making schema, fewer requirements
– Easily supplement existing analysis
– Less need for specialized tools
• Apache Drill reduces the time required to get answers
– JSON analysis in place – interactive
– Queries and dashboards
– Integrated with BI tools out of the box
• Tableau, MicroStrategy, Qlikview, others
• More examples on github
– mapr-demos

More Related Content

What's hot

Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jDataWorks Summit
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at UberDataWorks Summit
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended CutWes McKinney
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaDataWorks Summit
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Wes McKinney
 
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsGetting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsairisData
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Wes McKinney
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scaledatamantra
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookDatabricks
 
Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata IntegrationDr. Mirko Kämpf
 

What's hot (20)

Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Securing Data in Hadoop at Uber
Securing Data in Hadoop at UberSecuring Data in Hadoop at Uber
Securing Data in Hadoop at Uber
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended Cut
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsGetting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analytics
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Spark Technology Center IBM
Spark Technology Center IBMSpark Technology Center IBM
Spark Technology Center IBM
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
 
Enterprise Metadata Integration
Enterprise Metadata IntegrationEnterprise Metadata Integration
Enterprise Metadata Integration
 

Viewers also liked

Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRclive boulton
 
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsNYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsJason Shao
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
SQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillSQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillMapR Technologies
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?Attunity
 
Design Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in KafkaDesign Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in KafkaIan Downard
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Meetup Python Nantes - les tests en python
Meetup Python Nantes - les tests en pythonMeetup Python Nantes - les tests en python
Meetup Python Nantes - les tests en pythonArthur Lutz
 
Java OOP Programming language (Part 1) - Introduction to Java
Java OOP Programming language (Part 1) - Introduction to JavaJava OOP Programming language (Part 1) - Introduction to Java
Java OOP Programming language (Part 1) - Introduction to JavaOUM SAOKOSAL
 
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Matt Harrison
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Chia-Chi Chang
 
PyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MorePyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MoreMatt Harrison
 
Operator Overloading
Operator Overloading  Operator Overloading
Operator Overloading Sardar Alam
 
Installing Python on Mac
Installing Python on MacInstalling Python on Mac
Installing Python on MacWei-Wen Hsu
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonYi-Fan Chu
 
Lesson1 python an introduction
Lesson1 python an introductionLesson1 python an introduction
Lesson1 python an introductionArulalan T
 

Viewers also liked (20)

Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
 
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsNYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
SQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache DrillSQL-on-Hadoop with Apache Drill
SQL-on-Hadoop with Apache Drill
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?
 
Design Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in KafkaDesign Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in Kafka
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Meetup Python Nantes - les tests en python
Meetup Python Nantes - les tests en pythonMeetup Python Nantes - les tests en python
Meetup Python Nantes - les tests en python
 
Java OOP Programming language (Part 1) - Introduction to Java
Java OOP Programming language (Part 1) - Introduction to JavaJava OOP Programming language (Part 1) - Introduction to Java
Java OOP Programming language (Part 1) - Introduction to Java
 
Python - Lecture 1
Python - Lecture 1Python - Lecture 1
Python - Lecture 1
 
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
Analysis of Fatal Utah Avalanches with Python. From Scraping, Analysis, to In...
 
Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)Learning notes of r for python programmer (Temp1)
Learning notes of r for python programmer (Temp1)
 
Introduction to Advanced Javascript
Introduction to Advanced JavascriptIntroduction to Advanced Javascript
Introduction to Advanced Javascript
 
PyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MorePyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and More
 
Operator Overloading
Operator Overloading  Operator Overloading
Operator Overloading
 
Python for All
Python for All Python for All
Python for All
 
Installing Python on Mac
Installing Python on MacInstalling Python on Mac
Installing Python on Mac
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Lesson1 python an introduction
Lesson1 python an introductionLesson1 python an introduction
Lesson1 python an introduction
 

Similar to Exploring Enterprise Networks with Familiar BI Tools Using Apache Drill

Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Datafreshdatabos
 
IBM Internet-of-Things architecture and capabilities
IBM Internet-of-Things architecture and capabilitiesIBM Internet-of-Things architecture and capabilities
IBM Internet-of-Things architecture and capabilitiesIBM_Info_Management
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR Technologies
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 

Similar to Exploring Enterprise Networks with Familiar BI Tools Using Apache Drill (20)

Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Design patternsforiot
Design patternsforiotDesign patternsforiot
Design patternsforiot
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Apache drill
Apache drillApache drill
Apache drill
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
AnilKumarT_Resume_latest
AnilKumarT_Resume_latestAnilKumarT_Resume_latest
AnilKumarT_Resume_latest
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
IBM Internet-of-Things architecture and capabilities
IBM Internet-of-Things architecture and capabilitiesIBM Internet-of-Things architecture and capabilities
IBM Internet-of-Things architecture and capabilities
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 

More from MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

More from MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Exploring Enterprise Networks with Familiar BI Tools Using Apache Drill

  • 1. © 2015 MapR Technologies© 2015 MapR Technologies Exploring Enterprise Networks with Familiar BI Tools
  • 2. © 2015 MapR Technologies On the Menu • Discovery: why Hadoop + BI tools for analyzing networks? • Network analysis in a BI context • Apache Drill • Connecting BI tools to network data • Practical examples with Drill and BI – Querying packets with Tableau – Troubleshooting with SAP Lumira – Gaining insight into customer experience across multiple sources – Using built-in Drill features for faster analysis • Summary, conclusions, more resources
  • 3. © 2015 MapR Technologies Topics not covered in detail… • Packet capture architectures • Ways to capture packets effectively • Large-scale packet processing – others have done this • Comparison of BI tools • Survey of the best SQL-on-Hadoop technology
  • 4. © 2015 MapR Technologies There’s a lot happening in your network… • Packets, logs, interconnections • Many layers (L1-L7), “L8” • Network data is multi-faceted… – It’s serialized and highly structured – It facilitates communication between heterogeneous devices via common protocols – But it’s not structured to be stored and analyzed – The application often doesn’t care – Consequently, specialized tooling and software is required
  • 5. © 2015 MapR Technologies Why Hadoop + BI tools? • What does Hadoop enable that makes it a powerful tool for network analytics? • What’s new that wasn’t previous possible/desirable? • How does it augment existing solutions? • It’s many things: – New ways of accessing semi-structured data from the network – Offloading of existing data warehouses and tools – Combining, joining, blending network captures with other sources – Many network tools cannot answer questions about your business and customers – You can use SQL to get a lot of the answers you need
  • 6. © 2015 MapR Technologies New Data Sources Unlock New Insights & Apps Existing structured data • Well-defined and well- understood schema – OLTP data – Data warehouse data – End user data stores (e.g., Excel) New multi-structured data • Typically un-modeled, different in format – Network data – Clickstream data – Sensor data – Rich media (e.g., audio, video) – Documents … both types are needed today for deeper insights
  • 7. © 2015 MapR Technologies 1980 2000 20101990 2020 Fixed schema DBA controls structure Dynamic / Flexible schema Application controls structure NON-RELATIONAL DATASTORESRELATIONAL DATABASES GBs-TBs TBs-PBsVolume Database Network data, like other data, is increasingly Stored in Non- Relational Datastores Structure Development Structured Structured, semi-structured and unstructured Planned (release cycle = months-years) Iterative (release cycle = days-weeks)
  • 8. © 2015 MapR Technologies Apache Drill Brings Flexibility & Performance Access to any data type, any data source • Relational • Nested data • Schema-less Rapid time to insights • Query data in-situ • No Schemas required • Easy to get started Integration with existing tools • ANSI SQL • BI tool integration Scale in all dimensions • TB-PB of scale • 1000’s of users • 1000’s of nodes Granular Security • Authentication • Row/column level controls • De-centralized
  • 9. © 2015 MapR Technologies Granular security permissions through Drill views Name City State Credit Card # Dave San Jose CA 1374-7914-3865-4817 John Boulder CO 1374-9735-1794-9711 Raw File (/raw/cards.csv) Owner Admins Permission Admins Business Analyst Data Scientist Name City State Credit Card # Dave San Jose CA 1374-1111-1111-1111 John Boulder CO 1374-1111-1111-1111 Data Scientist View (/views/maskedcards.csv) Not a physical data copy Name City State Dave San Jose CA John Boulder CO Business Analyst View Owner Admins Permission Business Analysts Owner Admins Permission Data Scientists
  • 10. © 2015 MapR Technologies Self-Service Data Exploration Direct access to Hadoop data from familiar BI / Analytics tools- ANSI SQL compatible Ad-hoc Reporting Queries Raw Data Exploration Day Zero queries …
  • 11. © 2015 MapR Technologies Drill is a Distributed SQL query engine drillbit DataNode/Regi onServer drillbit DataNode/Regi onServer drillbit DataNode/Regi onServer ZooKeeper ZooKeeper ZooKeeper …  Scale out  Columnar and Vectorized execution  Optimistic and pipelined execution (no MR, Spark, Tez)  Late binding  Extensible
  • 12. © 2015 MapR Technologies - Sub-directory - HBase namespace - Hive database Run SQL on Captures Directly SELECT * FROM dfs.router1.`captures.json` Workspace - Pathnames - Hive table - HBase table Table - DFS (Text, Parquet, JSON) - HBase/MapR-DB - Hive Metastore/HCatalog - Easy API to go beyond Hadoop Storage plugin instance
  • 13. © 2015 MapR Technologies Network Analytics in a BI Context • Getting results from BI tools requires SQL expertise – Analytic techniques, visualizations, dashboarding – Proprietary information about your operations – Making sense of sources quickly • New SQL-on-Hadoop (like Drill) technologies enable leveraging this: – To find new areas to gain value from combining your own proprietary data with network sources – Augment the analysis you’re doing now via use cases for packet data you’re already storing in Hadoop – Use data in real-time that’s too large to fit into memory and/or hits BI tool limitations for analysis directly
  • 14. © 2015 MapR Technologies Hadoop Packet Processing Ecosystem • Translating to various formats – JSON – CSV – Parquet, others • Packet ingestion – Flume tcpdump source – Direct from hardware vendors • Northbound APIs – Openstack and opendaylight • More open source tools – Packet processing in Pig, etc.
  • 15. © 2015 MapR Technologies Network Data Sources • Data sources in the network are growing, changing – Existing: tcpdump, SPAN, pcap – New and more: SDN, NFV, REST APIs • Often not suitable for analysis directly – Requires building a schema – ETL – Structure is changing and evolving  ongoing management – Large size, too big for memory
  • 16. © 2015 MapR Technologies REST APIs and JSON • Self-describing data is common with REST APIs – JSON • Northbound APIs on almost everything in the network – Enables access to many operational views – But requires development work to pull it together • SQL queries directly on the data is difficult • Requires transformations, scripting, parsing
  • 17. © 2015 MapR Technologies View Drillbits information in the cluster
  • 18. © 2015 MapR Technologies Manage storage plugin instances through Web UI
  • 19. © 2015 MapR Technologies Monitor and manage Drill queries
  • 20. © 2015 MapR Technologies See details of the query
  • 21. © 2015 MapR Technologies SAP Lumira and Wireshark Example -- Scenario • Overview: – Sensor data in JSON format being gathered multiple times daily from remote locations – Done over an IP network, each sensor has an IP address • Problem – One sensor is experiencing reading failures – Network connectivity issues are suspected • Solution Approach – Take packet captures where we are reading sensors (central location) – CSV-formatted Wireshark file – Observe whether there are many TCP retransmissions happening between the source and destination – Ultimately, determine if the network is the problem and take action
  • 22. © 2015 MapR Technologies
  • 23. © 2015 MapR Technologies Summary • Using Drill from SAP Lumira, and the JDBC driver – We compared data across multiple sources • Notice we didn’t do any ETL – Or define any schema for the network data • Using existing ANSI SQL knowledge to query the data without transformations – Not just on the network data, but combined with other sources • Self-service
  • 24. © 2015 MapR Technologies
  • 25. © 2015 MapR Technologies Network Routing, OpenStack, JSON • Link-state routing protocols (OSPF, IS-IS, Trill) – Each participating node knows the topology of the entire network – A dump of the database shows all nodes and adjacencies – Physical and logical topology – Other information (MPLS, etc.) • OpenStack: pull networks, subnets, ports via REST API – Use Drill Explorer to build a view – Combine the data with device or customer information • Enables visualizing the entire network quickly
  • 26. © 2015 MapR Technologies OpenStack Networking APIs Example • JSON formatted responses • Run queries without any data preparation • Use of FLATTEN() for arbitrary maps
  • 27. © 2015 MapR Technologies FLATTEN() • FLATTEN() is useful for exploration of data that is repeated • Used on arrays • Columns are repeated as necessary to maintain association with each element of the array • Example: “host routes”: [ { “destination” : “0.0.0.0/0”, “nexthop”: “10.10.10.1” }, { “destination” : “192.168.10.0/24”, “nexthop”: “192.168.0.1”, }, … ]
  • 28. © 2015 MapR Technologies
  • 29. © 2015 MapR Technologies TCP Round-Trip Times Example • TCP RTT can affect customer experience in many ways – Not just loading pages – Also interactive, AJAX, forms, etc. • Much of this can be calculated with other tools, then visualized – Complex to calculate on your own • Only a part of overall performance story, but helpful – Example: switching network providers, adding caches or optimizers
  • 30. © 2015 MapR Technologies
  • 31. © 2015 MapR Technologies Summary and Conclusions • New SQL-on-Hadoop technologies enable network analysis in a BI context – Less time making schema, fewer requirements – Easily supplement existing analysis – Less need for specialized tools • Apache Drill reduces the time required to get answers – JSON analysis in place – interactive – Queries and dashboards – Integrated with BI tools out of the box • Tableau, MicroStrategy, Qlikview, others • More examples on github – mapr-demos