SlideShare a Scribd company logo
1 of 40
© 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential
© 2016 MapR Technologies
1
SQL-on-Hadoop with
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 3
Ingest
Store
Process
Consume
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 4
MapR Converged Data Platform
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 6
Agenda
• Why Drill?
• Some Drilling with Open Data
• How does it work?
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 7
1980 2000 20101990 2020
Fixed schema
DBA controls structure
Dynamic / Flexible schema
Application controls structure
NON-RELATIONAL DATASTORESRELATIONAL DATABASES
GBs-TBs TBs-PBsVolume
Database
Data Increasingly Stored in Non-Relational Datastores
Structure
Development
Structured Structured, semi-structured and unstructured
Planned (release cycle = months-years) Iterative (release cycle = days-weeks)
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 8
How To Bring SQL to Non-Relational Data Stores?
Familiarity of SQL Agility of NoSQL
• ANSI SQL semantics
• Low latency
• Integrated with Tools/Applications
• No schema management
– HDFS (Parquet, JSON, etc.)
– HBase
– …
• No transformation
– No silos of data
• Ease of use
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 9
Drill Supports Schema Discovery On-The-Fly
• Fixed schema
• Leverage schema in centralized
repository (Hive Metastore)
• Fixed schema, evolving schema or
schema-less
• Leverage schema in centralized
repository or self-describing data
2Schema Discovered On-The-FlySchema Declared In Advance
SCHEMA ON
WRITE
SCHEMA BEFORE
READ
SCHEMA ON THE
FLY
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 10
• Pioneering Data Agility for Hadoop
• Apache open source project
• Scale-out execution engine for low-latency queries
• Unified SQL-based API for analytics & operational applications
50+ contributors
150+ years of experience building
databases and distributed systems
Contributing to Apache Drill
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 11
- Sub-directory
- HBase namespace
- Hive database
- Database
Drill Enables ‘SQL-on-Everything’
SELECT * FROM dfs.yelp.`business.json`
Workspace
- Pathnames
- Hive table
- HBase table
- Table
Table
- DFS (Text, Parquet, JSON, XML)
- HBase/MapR-DB
- Hive Metastore/HCatalog
- RDBMS using JDBC
- Easy API to go beyond Hadoop
Storage plugin instance
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 12
Drill’s Data Model is Flexible
JSON
BSON
HBase
Parquet
Avro
CSV
TSV
Dynamic
schema
Fixed schema
Complex
Flat
Flexibility
Name Gender Age
Michael M 6
Jennifer F 3
RDBMS/SQL-on-Hadoop table
Flexibility
Apache Drill table
{
name: {
first: Michael,
last: Smith
},
hobbies: [ski, soccer],
district: Los Altos
}
{
name: {
first: Jennifer,
last: Gates
},
hobbies: [sing],
preschool: CCLC
}
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 13
Drilling into Data
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 14
Business dataset {
"business_id": "4bEjOyTaDG24SY5TxsaUNQ",
"full_address": "3655 Las Vegas Blvd SnThe StripnLas Vegas, NV 89109",
"hours": {
"Monday": {"close": "23:00", "open": "07:00"},
"Tuesday": {"close": "23:00", "open": "07:00"},
"Friday": {"close": "00:00", "open": "07:00"},
"Wednesday": {"close": "23:00", "open": "07:00"},
"Thursday": {"close": "23:00", "open": "07:00"},
"Sunday": {"close": "23:00", "open": "07:00"},
"Saturday": {"close": "00:00", "open": "07:00"}
},
"open": true,
"categories": ["Breakfast & Brunch", "Steakhouses", "French", "Restaurants"],
"city": "Las Vegas",
"review_count": 4084,
"name": "Mon Ami Gabi",
"neighborhoods": ["The Strip"],
"longitude": -115.172588519464,
"state": "NV",
"stars": 4.0,
"attributes": {
"Alcohol": "full_bar”,
"Noise Level": "average",
"Has TV": false,
"Attire": "casual",
"Ambience": {
"romantic": true,
"intimate": false,
"touristy": false,
"hipster": false,
"classy": true,
"trendy": false,
"casual": false
},
"Good For": {"dessert": false, "latenight": false, "lunch": false,
"dinner": true, "breakfast": false, "brunch": false},
}
}
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 15
Reviews dataset
{
"votes": {"funny": 0, "useful": 2, "cool": 1},
"user_id": "Xqd0DzHaiyRqVH3WRG7hzg",
"review_id": "15SdjuK7DmYqUAj6rjGowg",
"stars": 5,
"date": "2007-05-17",
"text": "dr. goldberg offers everything ...",
"type": "review",
"business_id": "vcNAWiLM4dR7D2nwwJ7nCA"
}
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 16
Interfaces and Tools
WebUIDrill Explorer
ODBC/JDBC
Sqlline
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 17
`
$ tar -xvzf apache-drill-1.4.0.tar.gz
$ bin/sqlline -u jdbc:drill:zk=local
$ bin/drill-embedded
> SELECT state, city, count(*) AS businesses
FROM dfs.yelp.`business.json.gz`
GROUP BY state, city
ORDER BY businesses DESC LIMIT 10;
+------------+------------+-------------+
| state | city | businesses |
+------------+------------+-------------+
| NV | Las Vegas | 12021 |
| AZ | Phoenix | 7499 |
| AZ | Scottsdale | 3605 |
| EDH | Edinburgh | 2804 |
| AZ | Mesa | 2041 |
| AZ | Tempe | 2025 |
| NV | Henderson | 1914 |
| AZ | Chandler | 1637 |
| WI | Madison | 1630 |
| AZ | Glendale | 1196 |
+------------+------------+-------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 18
Intuitive SQL Access to Complex Data
// It’s Friday 10pm in Vegas and looking for Hummus
> SELECT name, stars, b.hours.Friday friday, categories
FROM dfs.yelp.`business.json` b
WHERE b.hours.Friday.`open` < '22:00' AND
b.hours.Friday.`close` > '22:00' AND
REPEATED_CONTAINS(categories, 'Mediterranean') AND
city = 'Las Vegas'
ORDER BY stars DESC
LIMIT 2;
+------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+
| name | stars | friday | categories |
+------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+
| Khoury's Mediterranean Restaurant | 4.0 | {"close":"23:00","open":"11:00"} | ["Greek","Mediterranean","Middle Eastern","Restaurants"] |
| Olives | 4.0 | {"close":"22:30","open":"11:00"} | ["Bars","Mediterranean","Nightlife","Restaurants"] |
+------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 19
ANSI SQL Compatibility
//Get top cool rated businesses
 SELECT b.name from dfs.yelp.`business.json` b
WHERE b.business_id IN
(SELECT r.business_id FROM dfs.yelp.`review.json` r
GROUP BY r.business_id HAVING SUM(r.votes.cool) > 2000 ORDER BY
SUM(r.votes.cool) DESC);
+--------------------------------+
| name |
+--------------------------------+
| Earl of Sandwich |
| XS Nightclub |
| The Cosmopolitan of Las Vegas |
| Wicked Spoon |
| Bacchanal Buffet |
+--------------------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 20
Logical Views
//Create a view combining business and reviews datasets
> CREATE OR REPLACE VIEW dfs.tmp.BusinessReviews AS
SELECT b.name, b.stars, r.votes.funny,
r.votes.useful, r.votes.cool, r.`date`
FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r
WHERE r.business_id = b.business_id;
+-------+-------------------------------------------------------------------+
| ok | summary |
+-------+-------------------------------------------------------------------+
| true | View 'BusinessReviews' replaced successfully in 'dfs.tmp' schema |
+-------+-------------------------------------------------------------------+
> SELECT COUNT(*) AS Total FROM dfs.tmp.BusinessReviews;
+------------+
| Total |
+------------+
| 1125458 |
+------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 21
Materialized Views AKA Tables
> ALTER SESSION SET `store.format` = 'parquet';
> CREATE TABLE dfs.yelp.BusinessReviewsTbl AS
SELECT b.name, b.stars, r.votes.funny funny,
r.votes.useful useful, r.votes.cool cool, r.`date`
FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r
WHERE r.business_id = b.business_id;
+------------+---------------------------+
| Fragment | Number of records written |
+------------+---------------------------+
| 1_0 | 176448 |
| 1_1 | 192439 |
| 1_2 | 198625 |
| 1_3 | 200863 |
| 1_4 | 181420 |
| 1_5 | 175663 |
+------------+---------------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 22
Repeated Values Support
// Flatten repeated categories
> SELECT name, categories
FROM dfs.yelp.`business.json` LIMIT 3;
+---------------------------+-------------------------------------+
| name | categories |
+---------------------------+-------------------------------------+
| Eric Goldberg, MD | ["Doctors","Health & Medical"] |
| Clancy's Pub | ["Nightlife"] |
| Cool Springs Golf Center | ["Active Life","Mini Golf","Golf"] |
+—————————————+-------------------------------------+
> SELECT name, FLATTEN(categories) AS categories
FROM dfs.yelp.`business.json` LIMIT 5;
+---------------------------+-------------------+
| name | categories |
+---------------------------+-------------------+
| Eric Goldberg, MD | Doctors |
| Eric Goldberg, MD | Health & Medical |
| Clancy's Pub | Nightlife |
| Cool Springs Golf Center | Active Life |
| Cool Springs Golf Center | Mini Golf |
+---------------------------+-------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 23
Extensions to ANSI SQL to work with repeated values
// Get most common business categories
>SELECT category, count(*) AS categorycount
FROM (SELECT name, FLATTEN(categories) AS category
FROM dfs.yelp.`business.json`) c
GROUP BY category ORDER BY categorycount DESC;
+-------------------------------+----------------+
| category | categorycount |
+-------------------------------+----------------+
| Restaurants | 21892 |
| Shopping | 8919 |
| Automotive | 2965 |
… … … … … … … … … … … … … … … … … … … … … … … … … … …
| Oriental | 1 |
| High Fidelity Audio Equipment | 1 |
+--------------------------------+---------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 24
Checkins dataset {
"checkin_info":{
"3-4":1,
"13-5":1,
"6-6":1,
"14-5":1,
"14-6":1,
"14-2":1,
"14-3":1,
"19-0":1,
"11-5":1,
"13-2":1,
"11-6":2,
"11-3":1,
"12-6":1,
"6-5":1,
"5-5":1,
"9-2":1,
"9-5":1,
"9-6":1,
"5-2":1,
"7-6":1,
"7-5":1,
"7-4":1,
"17-5":1,
"8-5":1,
"10-2":1,
"10-5":1,
"10-6":1
},
"type":"checkin",
"business_id":"JwUE5GmEO-sH1FuwJgKBlQ"
}
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 25
Supports Dynamic / Unknown Columns
> SELECT KVGEN(checkin_info) checkins
FROM dfs.yelp.`checkin.json` LIMIT 1;
+------------+
| checkins |
+------------+
| [{"key":"3-4","value":1},{"key":"13-5","value":1},{"key":"6-6","value":1},{"key":"14-
5","value":1},{"key":"14-6","value":1},{"key":"14-2","value":1},{"key":"14-3","value":1},{"key":"19-
0","value":1},{"key":"11-5","value":1},{"key":"13-2","value":1},{"key":"11-6","value":2},{"key":"11-
3","value":1},{"key":"12-6","value":1},{"key":"6-5","value":1},{"key":"5-5","value":1},{"key":"9-
2","value":1},{"key":"9-5","value":1},{"key":"9-6","value":1},{"key":"5-2","value":1},{"key":"7-
6","value":1},{"key":"7-5","value":1},{"key":"7-4","value":1},{"key":"17-5","value":1},{"key":"8-
5","value":1},{"key":"10-2","value":1},{"key":"10-5","value":1},{"key":"10-6","value":1}] |
+------------+
> SELECT FLATTEN(KVGEN(checkin_info)) checkins FROM
dfs.yelp.`checkin.json` limit 6;
+---------------------------+
| checkins |
+---------------------------+
| {"key":"9-5","value":1} |
| {"key":"7-5","value":1} |
| {"key":"13-3","value":1} |
| {"key":"17-6","value":1} |
| {"key":"13-0","value":1} |
| {"key":"17-3","value":1} |
+---------------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 26
Makes it easy to work with dynamic/unknown columns
// Count total number of checkins on Sunday midnight
> SELECT SUM(checkintbl.checkins.`value`) as SundayMidnightCheckins FROM
(SELECT FLATTEN(KVGEN(checkin_info)) checkins
FROM dfs.yelp.checkin.json`) checkintbl
WHERE checkintbl.checkins.key='23-0';
+------------------------+
| SundayMidnightCheckins |
+------------------------+
| 8575 |
+------------------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 27
Federated Queries
// Join JSON File, Parquet and MongoDB collection
> SELECT u.name, b.category, count(1) nb_review
FROM mongo.yelp.`user` u , dfs.yelp.`review.parquet` r, (select business_id, flatten(categories) category from
dfs.yelp.`business.json` ) b
WHERE u.user_id = r.user_id
AND b.business_id = r.business_id
GROUP BY u.user_id, u.name, b.category
ORDER BY nb_review DESC
LIMIT 10;
+-----------+--------------+------------+
| name | category | nb_review |
+-----------+--------------+------------+
| Rand | Restaurants | 1086 |
| J | Restaurants | 661 |
| Jennifer | Restaurants | 657 |
| Brad | Restaurants | 638 |
| Jade | Restaurants | 586 |
… … … … … … … … … .. .. ..
| Emily | Restaurants | 560 |
| Norm | Restaurants | 543 |
| Aileen | Restaurants | 499 |
| Michael | Restaurants | 496 |
+-----------+--------------+------------+
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 28
Drill Data Sources
JDBC
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 29
Directories are implicit partitions
select dir0, count(1)
from dfs.data.`/logs/`
where dir1 in (1,2,3)
group by dir0
logs
├── 2014
│ ├── 1
│ ├── 2
│ ├── 3
│ └── 4
└── 2015
└── 1
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 30
How does it work?
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 31
Everything is a “Drillbit”
• “Drillbits” run on each node
• In-memory columnar execution
• Coordination, Planning,
Execution
• Networked or not
• Exposes JDBC, ODBC, REST
• Built In Web UI and CLI
• Extensible
• Custom Functions
• Data Sources
Drillbit
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 32
Data Locality
HBase HBase
Mac
HDFS HDFS HDFS
HDFS HDFS HDFS
mongod mongod
HBase
Windows
Clusters DesktopClusters
HDFS & HBase cluster
HDFS cluster
MongoDB cluster
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 33
Run drillbits close to your data!
Drillbit
HBase HBase
Mac
HDFS HDFS HDFS
HDFS HDFS HDFS
mongod mongod
HBase
Windows
Clusters DesktopClusters
HDFS & HBase cluster
HDFS cluster
MongoDB cluster
Drillbit Drillbit
Drillbit Drillbit Drillbit
Drillbit Drillbit
Drillbit
Drillbit
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 34
Query Execution
Client
Drillbit Drillbit Drillbit
• Client connects to “a” Drillbit
• This Drillbit becomes the foreman
• Foreman generates execution plan
• cost base query optimisation
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 35
Query Execution
Client
Drillbit Drillbit Drillbit
• Execution fragment are farmed to other Drillbits
• Drillbits exchange data when necessary
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 36
Query Execution
Client
Drillbit Drillbit Drillbit
• Results are returned to the user through foreman
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 37
Granular Security via Drill Views
Name City State Credit Card #
Dave San Jose CA 1374-7914-3865-4817
John Boulder CO 1374-9735-1794-9711
Raw File (/raw/cards.csv)
Owner
Admins
Permission Admins
Business Analyst Data Scientist
Name City State Credit Card #
Dave San Jose CA 1374-1111-1111-1111
John Boulder CO 1374-1111-1111-1111
Data Scientist View (/views/maskedcards.csv)
Not a physical data copy
Name City State
Dave San Jose CA
John Boulder CO
Business Analyst View
Owner
Admins
Permission
Business
Analysts
Owner
Admins
Permission
Data
Scientists
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 38
Extensibility
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 39
Extending Apache Drill
• File Format
• ORC
• …
• Data Sources
• NoSQL Databases
• Search Engines
• REST
• ….
• Custom Functions
https://github.com/mapr-demos/simple-drill-functions
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 40
Recommendations for Getting Started with Drill
New to Drill?
– Get started with Free MapR On Demand training
– Test Drive Drill on cloud with AWS
– Learn how to use Drill with Hadoop using MapR sandbox
– Workshop : https://github.com/tgrall/drill-workshop
Ready to play with your data?
– Try out Apache Drill in 10 mins guide on your desktop
– Download Drill for your cluster and start exploration
– Comprehensive tutorials and documentation available
Ask questions
– user@drill.apache.org
– dev@drill.apache.com
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 41
© 2016 MapR Technologies© 2016 MapR Technologies@tgrall 42
Q&A
@tgrall maprtech
tug@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

More Related Content

What's hot

Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Gera Shegalov
 
Introduction to Apache Drill - NYC Apache Drill Meetup
Introduction to Apache Drill - NYC Apache Drill MeetupIntroduction to Apache Drill - NYC Apache Drill Meetup
Introduction to Apache Drill - NYC Apache Drill MeetupVince Gonzalez
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Vince Gonzalez
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseNag Arvind Gudiseva
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillDataWorks Summit
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
Drilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillDrilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillCharles Givre
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillMapR Technologies
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesDataWorks Summit/Hadoop Summit
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperDataWorks Summit
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill WorkshopCharles Givre
 

What's hot (20)

Apache Drill
Apache DrillApache Drill
Apache Drill
 
Using Apache Drill
Using Apache DrillUsing Apache Drill
Using Apache Drill
 
Apache drill
Apache drillApache drill
Apache drill
 
Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013Apache Drill @ PJUG, Jan 15, 2013
Apache Drill @ PJUG, Jan 15, 2013
 
Introduction to Apache Drill - NYC Apache Drill Meetup
Introduction to Apache Drill - NYC Apache Drill MeetupIntroduction to Apache Drill - NYC Apache Drill Meetup
Introduction to Apache Drill - NYC Apache Drill Meetup
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0Working with Delimited Data in Apache Drill 1.6.0
Working with Delimited Data in Apache Drill 1.6.0
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBase
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
Drilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache DrillDrilling Cyber Security Data With Apache Drill
Drilling Cyber Security Data With Apache Drill
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
 
Spark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different RulesSpark SQL versus Apache Drill: Different Tools with Different Rules
Spark SQL versus Apache Drill: Different Tools with Different Rules
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS Developer
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill Workshop
 

Viewers also liked

Designing and Implementing your IOT Solutions with Open Source
Designing and Implementing your IOT Solutions with Open SourceDesigning and Implementing your IOT Solutions with Open Source
Designing and Implementing your IOT Solutions with Open SourceDataWorks Summit/Hadoop Summit
 
Processing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaProcessing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaMatthew Howlett
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRclive boulton
 
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsNYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsJason Shao
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache DrillDataWorks Summit
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?Attunity
 
Design Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in KafkaDesign Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in KafkaIan Downard
 
Scaling MQTT With Apache Kafka
Scaling MQTT With Apache KafkaScaling MQTT With Apache Kafka
Scaling MQTT With Apache Kafkakellogh
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureDataWorks Summit/Hadoop Summit
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?DataWorks Summit
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Charles Givre
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014John Berns
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1Charles Givre
 

Viewers also liked (17)

IoT Use Cases with MapR
IoT Use Cases with MapRIoT Use Cases with MapR
IoT Use Cases with MapR
 
Designing and Implementing your IOT Solutions with Open Source
Designing and Implementing your IOT Solutions with Open SourceDesigning and Implementing your IOT Solutions with Open Source
Designing and Implementing your IOT Solutions with Open Source
 
Processing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaProcessing IoT Data with Apache Kafka
Processing IoT Data with Apache Kafka
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
 
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and ApplicationsNYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
NYC Hadoop Meetup - MapR, Architecture, Philosophy and Applications
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?
 
Design Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in KafkaDesign Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data in Kafka
 
Scaling MQTT With Apache Kafka
Scaling MQTT With Apache KafkaScaling MQTT With Apache Kafka
Scaling MQTT With Apache Kafka
 
The Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data ArchitectureThe Stream is the Database - Revolutionizing Healthcare Data Architecture
The Stream is the Database - Revolutionizing Healthcare Data Architecture
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014IoT and Big Data - Iot Asia 2014
IoT and Big Data - Iot Asia 2014
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1
 

Similar to SQL-on-Hadoop with Apache Drill

Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallSpark Summit
 
What and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual GrallWhat and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual Gralldistributed matters
 
Apache drill self service data exploration (113)
Apache drill   self service data exploration (113)Apache drill   self service data exploration (113)
Apache drill self service data exploration (113)MapR Technologies
 
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)Spark Summit
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistSpagoWorld
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...Facultad de Informática UCM
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillTomer Shiran
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosSpark Summit
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesCarol McDonald
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkMapR Technologies
 
Self service data exploration with apache drill
Self service data exploration with apache drillSelf service data exploration with apache drill
Self service data exploration with apache drillMapR Technologies
 
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMRVancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMRAllice Shandler
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data PipelinesMapR Technologies
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamskyData Con LA
 

Similar to SQL-on-Hadoop with Apache Drill (20)

Adding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug GrallAdding Complex Data to Spark Stack by Tug Grall
Adding Complex Data to Spark Stack by Tug Grall
 
What and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual GrallWhat and Why and How: Apache Drill ! - Tugdual Grall
What and Why and How: Apache Drill ! - Tugdual Grall
 
Apache drill self service data exploration (113)
Apache drill   self service data exploration (113)Apache drill   self service data exploration (113)
Apache drill self service data exploration (113)
 
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)
Adding Complex Data to Spark Stack-(Neeraja Rentachintala, MapR)
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
Analyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache DrillAnalyzing Real-World Data with Apache Drill
Analyzing Real-World Data with Apache Drill
 
Drill 1.0
Drill 1.0Drill 1.0
Drill 1.0
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
 
Presentation
PresentationPresentation
Presentation
 
Self service data exploration with apache drill
Self service data exploration with apache drillSelf service data exploration with apache drill
Self service data exploration with apache drill
 
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMRVancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
Vancouver AWS Meetup Slides 11-20-2018 Apache Spark with Amazon EMR
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 
Spark Streaming Data Pipelines
Spark Streaming Data PipelinesSpark Streaming Data Pipelines
Spark Streaming Data Pipelines
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
 

More from MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 

More from MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

SQL-on-Hadoop with Apache Drill

  • 1. © 2016 MapR Technologies© 2016 MapR TechnologiesMapR Confidential © 2016 MapR Technologies 1 SQL-on-Hadoop with
  • 2. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 3 Ingest Store Process Consume
  • 3. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 4 MapR Converged Data Platform
  • 4. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 6 Agenda • Why Drill? • Some Drilling with Open Data • How does it work?
  • 5. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 7 1980 2000 20101990 2020 Fixed schema DBA controls structure Dynamic / Flexible schema Application controls structure NON-RELATIONAL DATASTORESRELATIONAL DATABASES GBs-TBs TBs-PBsVolume Database Data Increasingly Stored in Non-Relational Datastores Structure Development Structured Structured, semi-structured and unstructured Planned (release cycle = months-years) Iterative (release cycle = days-weeks)
  • 6. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 8 How To Bring SQL to Non-Relational Data Stores? Familiarity of SQL Agility of NoSQL • ANSI SQL semantics • Low latency • Integrated with Tools/Applications • No schema management – HDFS (Parquet, JSON, etc.) – HBase – … • No transformation – No silos of data • Ease of use
  • 7. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 9 Drill Supports Schema Discovery On-The-Fly • Fixed schema • Leverage schema in centralized repository (Hive Metastore) • Fixed schema, evolving schema or schema-less • Leverage schema in centralized repository or self-describing data 2Schema Discovered On-The-FlySchema Declared In Advance SCHEMA ON WRITE SCHEMA BEFORE READ SCHEMA ON THE FLY
  • 8. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 10 • Pioneering Data Agility for Hadoop • Apache open source project • Scale-out execution engine for low-latency queries • Unified SQL-based API for analytics & operational applications 50+ contributors 150+ years of experience building databases and distributed systems Contributing to Apache Drill
  • 9. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 11 - Sub-directory - HBase namespace - Hive database - Database Drill Enables ‘SQL-on-Everything’ SELECT * FROM dfs.yelp.`business.json` Workspace - Pathnames - Hive table - HBase table - Table Table - DFS (Text, Parquet, JSON, XML) - HBase/MapR-DB - Hive Metastore/HCatalog - RDBMS using JDBC - Easy API to go beyond Hadoop Storage plugin instance
  • 10. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 12 Drill’s Data Model is Flexible JSON BSON HBase Parquet Avro CSV TSV Dynamic schema Fixed schema Complex Flat Flexibility Name Gender Age Michael M 6 Jennifer F 3 RDBMS/SQL-on-Hadoop table Flexibility Apache Drill table { name: { first: Michael, last: Smith }, hobbies: [ski, soccer], district: Los Altos } { name: { first: Jennifer, last: Gates }, hobbies: [sing], preschool: CCLC }
  • 11. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 13 Drilling into Data
  • 12. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 14 Business dataset { "business_id": "4bEjOyTaDG24SY5TxsaUNQ", "full_address": "3655 Las Vegas Blvd SnThe StripnLas Vegas, NV 89109", "hours": { "Monday": {"close": "23:00", "open": "07:00"}, "Tuesday": {"close": "23:00", "open": "07:00"}, "Friday": {"close": "00:00", "open": "07:00"}, "Wednesday": {"close": "23:00", "open": "07:00"}, "Thursday": {"close": "23:00", "open": "07:00"}, "Sunday": {"close": "23:00", "open": "07:00"}, "Saturday": {"close": "00:00", "open": "07:00"} }, "open": true, "categories": ["Breakfast & Brunch", "Steakhouses", "French", "Restaurants"], "city": "Las Vegas", "review_count": 4084, "name": "Mon Ami Gabi", "neighborhoods": ["The Strip"], "longitude": -115.172588519464, "state": "NV", "stars": 4.0, "attributes": { "Alcohol": "full_bar”, "Noise Level": "average", "Has TV": false, "Attire": "casual", "Ambience": { "romantic": true, "intimate": false, "touristy": false, "hipster": false, "classy": true, "trendy": false, "casual": false }, "Good For": {"dessert": false, "latenight": false, "lunch": false, "dinner": true, "breakfast": false, "brunch": false}, } }
  • 13. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 15 Reviews dataset { "votes": {"funny": 0, "useful": 2, "cool": 1}, "user_id": "Xqd0DzHaiyRqVH3WRG7hzg", "review_id": "15SdjuK7DmYqUAj6rjGowg", "stars": 5, "date": "2007-05-17", "text": "dr. goldberg offers everything ...", "type": "review", "business_id": "vcNAWiLM4dR7D2nwwJ7nCA" }
  • 14. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 16 Interfaces and Tools WebUIDrill Explorer ODBC/JDBC Sqlline
  • 15. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 17 ` $ tar -xvzf apache-drill-1.4.0.tar.gz $ bin/sqlline -u jdbc:drill:zk=local $ bin/drill-embedded > SELECT state, city, count(*) AS businesses FROM dfs.yelp.`business.json.gz` GROUP BY state, city ORDER BY businesses DESC LIMIT 10; +------------+------------+-------------+ | state | city | businesses | +------------+------------+-------------+ | NV | Las Vegas | 12021 | | AZ | Phoenix | 7499 | | AZ | Scottsdale | 3605 | | EDH | Edinburgh | 2804 | | AZ | Mesa | 2041 | | AZ | Tempe | 2025 | | NV | Henderson | 1914 | | AZ | Chandler | 1637 | | WI | Madison | 1630 | | AZ | Glendale | 1196 | +------------+------------+-------------+
  • 16. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 18 Intuitive SQL Access to Complex Data // It’s Friday 10pm in Vegas and looking for Hummus > SELECT name, stars, b.hours.Friday friday, categories FROM dfs.yelp.`business.json` b WHERE b.hours.Friday.`open` < '22:00' AND b.hours.Friday.`close` > '22:00' AND REPEATED_CONTAINS(categories, 'Mediterranean') AND city = 'Las Vegas' ORDER BY stars DESC LIMIT 2; +------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+ | name | stars | friday | categories | +------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+ | Khoury's Mediterranean Restaurant | 4.0 | {"close":"23:00","open":"11:00"} | ["Greek","Mediterranean","Middle Eastern","Restaurants"] | | Olives | 4.0 | {"close":"22:30","open":"11:00"} | ["Bars","Mediterranean","Nightlife","Restaurants"] | +------------------------------------+--------+-----------------------------------+-----------------------------------------------------------+
  • 17. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 19 ANSI SQL Compatibility //Get top cool rated businesses  SELECT b.name from dfs.yelp.`business.json` b WHERE b.business_id IN (SELECT r.business_id FROM dfs.yelp.`review.json` r GROUP BY r.business_id HAVING SUM(r.votes.cool) > 2000 ORDER BY SUM(r.votes.cool) DESC); +--------------------------------+ | name | +--------------------------------+ | Earl of Sandwich | | XS Nightclub | | The Cosmopolitan of Las Vegas | | Wicked Spoon | | Bacchanal Buffet | +--------------------------------+
  • 18. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 20 Logical Views //Create a view combining business and reviews datasets > CREATE OR REPLACE VIEW dfs.tmp.BusinessReviews AS SELECT b.name, b.stars, r.votes.funny, r.votes.useful, r.votes.cool, r.`date` FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r WHERE r.business_id = b.business_id; +-------+-------------------------------------------------------------------+ | ok | summary | +-------+-------------------------------------------------------------------+ | true | View 'BusinessReviews' replaced successfully in 'dfs.tmp' schema | +-------+-------------------------------------------------------------------+ > SELECT COUNT(*) AS Total FROM dfs.tmp.BusinessReviews; +------------+ | Total | +------------+ | 1125458 | +------------+
  • 19. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 21 Materialized Views AKA Tables > ALTER SESSION SET `store.format` = 'parquet'; > CREATE TABLE dfs.yelp.BusinessReviewsTbl AS SELECT b.name, b.stars, r.votes.funny funny, r.votes.useful useful, r.votes.cool cool, r.`date` FROM dfs.yelp.`business.json` b, dfs.yelp.`review.json` r WHERE r.business_id = b.business_id; +------------+---------------------------+ | Fragment | Number of records written | +------------+---------------------------+ | 1_0 | 176448 | | 1_1 | 192439 | | 1_2 | 198625 | | 1_3 | 200863 | | 1_4 | 181420 | | 1_5 | 175663 | +------------+---------------------------+
  • 20. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 22 Repeated Values Support // Flatten repeated categories > SELECT name, categories FROM dfs.yelp.`business.json` LIMIT 3; +---------------------------+-------------------------------------+ | name | categories | +---------------------------+-------------------------------------+ | Eric Goldberg, MD | ["Doctors","Health & Medical"] | | Clancy's Pub | ["Nightlife"] | | Cool Springs Golf Center | ["Active Life","Mini Golf","Golf"] | +—————————————+-------------------------------------+ > SELECT name, FLATTEN(categories) AS categories FROM dfs.yelp.`business.json` LIMIT 5; +---------------------------+-------------------+ | name | categories | +---------------------------+-------------------+ | Eric Goldberg, MD | Doctors | | Eric Goldberg, MD | Health & Medical | | Clancy's Pub | Nightlife | | Cool Springs Golf Center | Active Life | | Cool Springs Golf Center | Mini Golf | +---------------------------+-------------------+
  • 21. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 23 Extensions to ANSI SQL to work with repeated values // Get most common business categories >SELECT category, count(*) AS categorycount FROM (SELECT name, FLATTEN(categories) AS category FROM dfs.yelp.`business.json`) c GROUP BY category ORDER BY categorycount DESC; +-------------------------------+----------------+ | category | categorycount | +-------------------------------+----------------+ | Restaurants | 21892 | | Shopping | 8919 | | Automotive | 2965 | … … … … … … … … … … … … … … … … … … … … … … … … … … … | Oriental | 1 | | High Fidelity Audio Equipment | 1 | +--------------------------------+---------------+
  • 22. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 24 Checkins dataset { "checkin_info":{ "3-4":1, "13-5":1, "6-6":1, "14-5":1, "14-6":1, "14-2":1, "14-3":1, "19-0":1, "11-5":1, "13-2":1, "11-6":2, "11-3":1, "12-6":1, "6-5":1, "5-5":1, "9-2":1, "9-5":1, "9-6":1, "5-2":1, "7-6":1, "7-5":1, "7-4":1, "17-5":1, "8-5":1, "10-2":1, "10-5":1, "10-6":1 }, "type":"checkin", "business_id":"JwUE5GmEO-sH1FuwJgKBlQ" }
  • 23. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 25 Supports Dynamic / Unknown Columns > SELECT KVGEN(checkin_info) checkins FROM dfs.yelp.`checkin.json` LIMIT 1; +------------+ | checkins | +------------+ | [{"key":"3-4","value":1},{"key":"13-5","value":1},{"key":"6-6","value":1},{"key":"14- 5","value":1},{"key":"14-6","value":1},{"key":"14-2","value":1},{"key":"14-3","value":1},{"key":"19- 0","value":1},{"key":"11-5","value":1},{"key":"13-2","value":1},{"key":"11-6","value":2},{"key":"11- 3","value":1},{"key":"12-6","value":1},{"key":"6-5","value":1},{"key":"5-5","value":1},{"key":"9- 2","value":1},{"key":"9-5","value":1},{"key":"9-6","value":1},{"key":"5-2","value":1},{"key":"7- 6","value":1},{"key":"7-5","value":1},{"key":"7-4","value":1},{"key":"17-5","value":1},{"key":"8- 5","value":1},{"key":"10-2","value":1},{"key":"10-5","value":1},{"key":"10-6","value":1}] | +------------+ > SELECT FLATTEN(KVGEN(checkin_info)) checkins FROM dfs.yelp.`checkin.json` limit 6; +---------------------------+ | checkins | +---------------------------+ | {"key":"9-5","value":1} | | {"key":"7-5","value":1} | | {"key":"13-3","value":1} | | {"key":"17-6","value":1} | | {"key":"13-0","value":1} | | {"key":"17-3","value":1} | +---------------------------+
  • 24. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 26 Makes it easy to work with dynamic/unknown columns // Count total number of checkins on Sunday midnight > SELECT SUM(checkintbl.checkins.`value`) as SundayMidnightCheckins FROM (SELECT FLATTEN(KVGEN(checkin_info)) checkins FROM dfs.yelp.checkin.json`) checkintbl WHERE checkintbl.checkins.key='23-0'; +------------------------+ | SundayMidnightCheckins | +------------------------+ | 8575 | +------------------------+
  • 25. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 27 Federated Queries // Join JSON File, Parquet and MongoDB collection > SELECT u.name, b.category, count(1) nb_review FROM mongo.yelp.`user` u , dfs.yelp.`review.parquet` r, (select business_id, flatten(categories) category from dfs.yelp.`business.json` ) b WHERE u.user_id = r.user_id AND b.business_id = r.business_id GROUP BY u.user_id, u.name, b.category ORDER BY nb_review DESC LIMIT 10; +-----------+--------------+------------+ | name | category | nb_review | +-----------+--------------+------------+ | Rand | Restaurants | 1086 | | J | Restaurants | 661 | | Jennifer | Restaurants | 657 | | Brad | Restaurants | 638 | | Jade | Restaurants | 586 | … … … … … … … … … .. .. .. | Emily | Restaurants | 560 | | Norm | Restaurants | 543 | | Aileen | Restaurants | 499 | | Michael | Restaurants | 496 | +-----------+--------------+------------+
  • 26. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 28 Drill Data Sources JDBC
  • 27. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 29 Directories are implicit partitions select dir0, count(1) from dfs.data.`/logs/` where dir1 in (1,2,3) group by dir0 logs ├── 2014 │ ├── 1 │ ├── 2 │ ├── 3 │ └── 4 └── 2015 └── 1
  • 28. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 30 How does it work?
  • 29. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 31 Everything is a “Drillbit” • “Drillbits” run on each node • In-memory columnar execution • Coordination, Planning, Execution • Networked or not • Exposes JDBC, ODBC, REST • Built In Web UI and CLI • Extensible • Custom Functions • Data Sources Drillbit
  • 30. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 32 Data Locality HBase HBase Mac HDFS HDFS HDFS HDFS HDFS HDFS mongod mongod HBase Windows Clusters DesktopClusters HDFS & HBase cluster HDFS cluster MongoDB cluster
  • 31. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 33 Run drillbits close to your data! Drillbit HBase HBase Mac HDFS HDFS HDFS HDFS HDFS HDFS mongod mongod HBase Windows Clusters DesktopClusters HDFS & HBase cluster HDFS cluster MongoDB cluster Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit Drillbit
  • 32. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 34 Query Execution Client Drillbit Drillbit Drillbit • Client connects to “a” Drillbit • This Drillbit becomes the foreman • Foreman generates execution plan • cost base query optimisation
  • 33. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 35 Query Execution Client Drillbit Drillbit Drillbit • Execution fragment are farmed to other Drillbits • Drillbits exchange data when necessary
  • 34. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 36 Query Execution Client Drillbit Drillbit Drillbit • Results are returned to the user through foreman
  • 35. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 37 Granular Security via Drill Views Name City State Credit Card # Dave San Jose CA 1374-7914-3865-4817 John Boulder CO 1374-9735-1794-9711 Raw File (/raw/cards.csv) Owner Admins Permission Admins Business Analyst Data Scientist Name City State Credit Card # Dave San Jose CA 1374-1111-1111-1111 John Boulder CO 1374-1111-1111-1111 Data Scientist View (/views/maskedcards.csv) Not a physical data copy Name City State Dave San Jose CA John Boulder CO Business Analyst View Owner Admins Permission Business Analysts Owner Admins Permission Data Scientists
  • 36. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 38 Extensibility
  • 37. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 39 Extending Apache Drill • File Format • ORC • … • Data Sources • NoSQL Databases • Search Engines • REST • …. • Custom Functions https://github.com/mapr-demos/simple-drill-functions
  • 38. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 40 Recommendations for Getting Started with Drill New to Drill? – Get started with Free MapR On Demand training – Test Drive Drill on cloud with AWS – Learn how to use Drill with Hadoop using MapR sandbox – Workshop : https://github.com/tgrall/drill-workshop Ready to play with your data? – Try out Apache Drill in 10 mins guide on your desktop – Download Drill for your cluster and start exploration – Comprehensive tutorials and documentation available Ask questions – user@drill.apache.org – dev@drill.apache.com
  • 39. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 41
  • 40. © 2016 MapR Technologies© 2016 MapR Technologies@tgrall 42 Q&A @tgrall maprtech tug@mapr.com Engage with us! MapR maprtech mapr-technologies

Editor's Notes

  1. Change copyright to 2016
  2. The power of MapR begins with the power of open source innovation and community participation. In some cases MapR leads the community in projects like Apache Mahout (machine learning) or Apache Drill (SQL on Hadoop) In other areas, MapR contributes, integrates Apache and other open source software (OSS) projects into the MapR distribution, delivering a more reliable and performant system with lower overall TCO and easier system management. MapR releases a new version with the latest OSS innovations on a monthly basis. We add 2-4 new Apache projects annually as new projects become production ready and based on customer demand.
  3. The database/datastore landscape is evolving to meet the new requirements. 2009 was the inflection point. NoSchema systems in which applications control structure. Developers are being empowered and they are voting for the agility offered by these systems. In the early days if this revolution we sacrificed the query language, and we eliminated the ability to leverage the knowledge and tools available to millions of people. We’re changing that by a distributed SQL engine. But when we do that, we have to keep in mind that this transition to a NoSchema world happened for a reason, and we don’t want to reintroduce the centralized, DBA-managed schema.
  4. The database/datastore landscape is evolving to meet the new requirements. 2009 was the inflection point. NoSchema systems in which applications control structure. Developers are being empowered and they are voting for the agility offered by these systems. In the early days if this revolution we sacrificed the query language, and we eliminated the ability to leverage the knowledge and tools available to millions of people. We’re changing that by a distributed SQL engine. But when we do that, we have to keep in mind that this transition to a NoSchema world happened for a reason, and we don’t want to reintroduce the centralized, DBA-managed schema.
  5. All SQL engines (traditional or SQL-on-Hadoop) view tables as spreadsheet-like data structures with rows and columns. All records have the same structure, and there is no support for nested data or repeating fields. Drill views tables conceptually as collections of JSON (with additional types) documents. Each record can have a different structure (hence, schema-less). This is revolutionary and has never been done before. If you consider the four data models shown in the 2x2, all models can be represented by the complex, no schema model (JSON) because it is the most flexible. However, no other data model can be represented by the flat, fixed schema model. Therefore, when using any SQL engine except Drill, the data has to be transformed before it can be available to queries.
  6. CSV vs JSON formatting
  7. CSV vs JSON formatting
  8. It’s 10pm in Vegas and I Want Good Hummus!
  9. CSV vs JSON formatting
  10. User Connect to the Drill bit That drillbit becomes the FOREMAN Foreman generate execution plan cost based query optimisation and locality
  11. Execution fragment are farmed to other drillbits Drillbits exange fata as necessary
  12. Results are returned to the user through foreman
  13. Follow me on twitter to get updates for documentation on the Zeta Architecture