SlideShare a Scribd company logo
1 of 24
Download to read offline
in the toolbox
Naoki Takezoe
@takezoen
BizReach, Inc
A lot of JSON in the world
● Configuration
● Data
● Log
We want to query or analyze them.
How?
Solutions for searching JSON
We♥SQL
What is Apache Drill?
● Storage
○ Classpath, Local file system / HDFS / S3, HBase,
Hive, MongoDB, JDBC
● File format
○ JSON, Parquet, CSV / TSV / PSV
Schema-free SQL Query Engine for
Hadoop, NoSQL and Cloud Storage
Let's begin!!
Installation
1. Download and expand Drill distribution
2. cd apache-drill-1.6.0/bin
3. ./drill-embedded http://localhost:8047/
Query local JSON files
{"name": "suzuki", "dept": "sales"}
{"name": "yamada", "dept": "development"}
{"name": "sato", "dept": "development"}
...
SELECT * FROM dfs.`/tmp/users.json` T1
WHERE T1.name = 'takezoe'
Access to RDB tables
Configure jdbc storage plugin at the web console:
{
"type": "jdbc",
"driver": "org.h2.Driver",
"url": "jdbc:h2:~/.gitbucket/data",
"username": "sa",
"password": "sa",
"enabled": true
}
Join JSON and RDB
SELECT
T1.`user`.name AS name,
T2.MAIL_ADDRESS AS mail
FROM dfs.`/tmp/users.json` T1
INNER JOIN h2.DATA.PUBLIC.ACCOUNT T2
ON T1.`user`.name = T2.USER_NAME
Connect to Drill via JDBC
We can use any JDBC frontend or BI tool with Drill
JDBC
Requires ZooKeeper
Connect to Drill via JDBC
Setup ZooKeeper
$ tar xvzf zookeeper-3.4.8.tar.gz
$ cd zookeeper-3.4.8
$ mv conf/zoo_sample.cfg conf/zoo.cfg
$ cd bin
$ ./zkServer.sh start
Run drillbit
$ cd apache-drill-1.6.0/bin
$ ./drillbit.sh start
Connect to Drill via JDBC
● JDBC Driver
○ DRILL_HOME/jars/jdbc-driver/drill-jdbc-all-1.6.0.jar
● Class
○ org.apache.drill.jdbc.Driver
● URL
○ jdbc:drill:drillbit=localhost
Handling nested JSON
Query nested JSON
{"user": {"name": "suzuki", "dept": "sales"}}
{"user": {"name": "yamada", "dept": "development"}}
{"user": {"name": "sato", "dept": "development"}}
...
SELECT
T.`user`.name AS name,
T.`user`.dept AS dept
FROM dfs.`/tmp/users.json` T
WHERE T.`user`.name = 'yamada';
Extract JSON
property as column
Expand nested JSON property to records
{"user": {
"name": "yamada",
"experience": [ {"lang": "Java"}, {"lang": "Scala"} ]
}}
SELECT
T2.name AS name,
T2.experience.lang AS lang,
FROM (
SELECT
T1.`user`.name AS name,
FLATTEN(T1.`user`.experience) AS experience
FROM dfs.`/tmp/users.json` T1
) T2
Expand nested array
as individual table
In the case of jq
$ cat users.json | jq '.user | select(.name == "yamada")'
Nested JSON in Drill brings complexy.
Maybe jq is better for simple query?
Use cases
Action log
● Store action log into the local file as JSON
● We can query them using Drill if necessary
Data warehouse
● Aggregate various datasources to Drill
● Data synchronization is no need
e.g. Access Elasticsearch through Hive
● elasticsearch-hadoop supports Hive
● Drill supports Hive
http://takezoe.hatenablog.com/entry/20150524/p1
Can we access Elasticsearch from Drill?
Conclusion
Conclusion
Apache Drill is
● good tool for querying various datasets
● easy setup and user friendly
● pre-investment is not required
● useful for small data, not only big data
Put Apache Drill into your toolbox!

More Related Content

Viewers also liked

An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
MapR Technologies
 
JavaからScalaへ
JavaからScalaへJavaからScalaへ
JavaからScalaへ
takezoe
 
ネタじゃないScala.js
ネタじゃないScala.jsネタじゃないScala.js
ネタじゃないScala.js
takezoe
 
Play2実践tips集
Play2実践tips集Play2実践tips集
Play2実践tips集
takezoe
 
Scala界隈の近況
Scala界隈の近況Scala界隈の近況
Scala界隈の近況
takezoe
 
GitBucket: The perfect Github clone by Scala
GitBucket: The perfect Github clone by ScalaGitBucket: The perfect Github clone by Scala
GitBucket: The perfect Github clone by Scala
takezoe
 
そんなトランザクションマネージャで大丈夫か?
そんなトランザクションマネージャで大丈夫か?そんなトランザクションマネージャで大丈夫か?
そんなトランザクションマネージャで大丈夫か?
takezoe
 
SIerでScalaを使うために私がしたこと
SIerでScalaを使うために私がしたことSIerでScalaを使うために私がしたこと
SIerでScalaを使うために私がしたこと
takezoe
 

Viewers also liked (20)

Scala Frustrations
Scala FrustrationsScala Frustrations
Scala Frustrations
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
JavaからScalaへ
JavaからScalaへJavaからScalaへ
JavaからScalaへ
 
ネタじゃないScala.js
ネタじゃないScala.jsネタじゃないScala.js
ネタじゃないScala.js
 
Play2実践tips集
Play2実践tips集Play2実践tips集
Play2実践tips集
 
Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12
Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12
Drilling into Data with Apache Drill - Tokyo Apache Drill Meetup 2015/11/12
 
Scala界隈の近況
Scala界隈の近況Scala界隈の近況
Scala界隈の近況
 
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...
 
GitBucket: The perfect Github clone by Scala
GitBucket: The perfect Github clone by ScalaGitBucket: The perfect Github clone by Scala
GitBucket: The perfect Github clone by Scala
 
Reactive database access with Slick3
Reactive database access with Slick3Reactive database access with Slick3
Reactive database access with Slick3
 
Lightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just RightLightbend Lagom: Microservices Just Right
Lightbend Lagom: Microservices Just Right
 
そんなトランザクションマネージャで大丈夫か?
そんなトランザクションマネージャで大丈夫か?そんなトランザクションマネージャで大丈夫か?
そんなトランザクションマネージャで大丈夫か?
 
Apache Drill で日本語を扱ってみよう + オープンデータ解析
Apache Drill で日本語を扱ってみよう + オープンデータ解析Apache Drill で日本語を扱ってみよう + オープンデータ解析
Apache Drill で日本語を扱ってみよう + オープンデータ解析
 
Scala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscalaScala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscala
 
Java9 and Project Jigsaw
Java9 and Project JigsawJava9 and Project Jigsaw
Java9 and Project Jigsaw
 
SIerでScalaを使うために私がしたこと
SIerでScalaを使うために私がしたことSIerでScalaを使うために私がしたこと
SIerでScalaを使うために私がしたこと
 
イマドキの現場で使えるJavaライブラリ事情
イマドキの現場で使えるJavaライブラリ事情イマドキの現場で使えるJavaライブラリ事情
イマドキの現場で使えるJavaライブラリ事情
 
Slick eventsourcing
Slick eventsourcingSlick eventsourcing
Slick eventsourcing
 
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscalaビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache Drill
 

More from takezoe

Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The CloudJourney of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The Cloud
takezoe
 
GitBucket: Git Centric Software Development Platform by Scala
GitBucket:  Git Centric Software Development Platform by ScalaGitBucket:  Git Centric Software Development Platform by Scala
GitBucket: Git Centric Software Development Platform by Scala
takezoe
 
Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」
takezoe
 
Scala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.js
takezoe
 

More from takezoe (13)

Journey of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The CloudJourney of Migrating Millions of Queries on The Cloud
Journey of Migrating Millions of Queries on The Cloud
 
GitBucket: Open source self-hosting Git server built by Scala
GitBucket: Open source self-hosting Git server built by ScalaGitBucket: Open source self-hosting Git server built by Scala
GitBucket: Open source self-hosting Git server built by Scala
 
Testing Distributed Query Engine as a Service
Testing Distributed Query Engine as a ServiceTesting Distributed Query Engine as a Service
Testing Distributed Query Engine as a Service
 
Revisit Dependency Injection in scala
Revisit Dependency Injection in scalaRevisit Dependency Injection in scala
Revisit Dependency Injection in scala
 
How to keep maintainability of long life Scala applications
How to keep maintainability of long life Scala applicationsHow to keep maintainability of long life Scala applications
How to keep maintainability of long life Scala applications
 
頑張りすぎないScala
頑張りすぎないScala頑張りすぎないScala
頑張りすぎないScala
 
GitBucket: Git Centric Software Development Platform by Scala
GitBucket:  Git Centric Software Development Platform by ScalaGitBucket:  Git Centric Software Development Platform by Scala
GitBucket: Git Centric Software Development Platform by Scala
 
Non-Functional Programming in Scala
Non-Functional Programming in ScalaNon-Functional Programming in Scala
Non-Functional Programming in Scala
 
Scala警察のすすめ
Scala警察のすすめScala警察のすすめ
Scala警察のすすめ
 
Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」Scala製機械学習サーバ「Apache PredictionIO」
Scala製機械学習サーバ「Apache PredictionIO」
 
The best of AltJava is Xtend
The best of AltJava is XtendThe best of AltJava is Xtend
The best of AltJava is Xtend
 
Scala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.js
 
Excel方眼紙を支えるJava技術 2015
Excel方眼紙を支えるJava技術 2015Excel方眼紙を支えるJava技術 2015
Excel方眼紙を支えるJava技術 2015
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Apache Drill in the toolbox

  • 1. in the toolbox Naoki Takezoe @takezoen BizReach, Inc
  • 2. A lot of JSON in the world ● Configuration ● Data ● Log
  • 3. We want to query or analyze them. How?
  • 6. What is Apache Drill? ● Storage ○ Classpath, Local file system / HDFS / S3, HBase, Hive, MongoDB, JDBC ● File format ○ JSON, Parquet, CSV / TSV / PSV Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
  • 8. Installation 1. Download and expand Drill distribution 2. cd apache-drill-1.6.0/bin 3. ./drill-embedded http://localhost:8047/
  • 9. Query local JSON files {"name": "suzuki", "dept": "sales"} {"name": "yamada", "dept": "development"} {"name": "sato", "dept": "development"} ... SELECT * FROM dfs.`/tmp/users.json` T1 WHERE T1.name = 'takezoe'
  • 10. Access to RDB tables Configure jdbc storage plugin at the web console: { "type": "jdbc", "driver": "org.h2.Driver", "url": "jdbc:h2:~/.gitbucket/data", "username": "sa", "password": "sa", "enabled": true }
  • 11. Join JSON and RDB SELECT T1.`user`.name AS name, T2.MAIL_ADDRESS AS mail FROM dfs.`/tmp/users.json` T1 INNER JOIN h2.DATA.PUBLIC.ACCOUNT T2 ON T1.`user`.name = T2.USER_NAME
  • 12. Connect to Drill via JDBC We can use any JDBC frontend or BI tool with Drill JDBC Requires ZooKeeper
  • 13. Connect to Drill via JDBC Setup ZooKeeper $ tar xvzf zookeeper-3.4.8.tar.gz $ cd zookeeper-3.4.8 $ mv conf/zoo_sample.cfg conf/zoo.cfg $ cd bin $ ./zkServer.sh start Run drillbit $ cd apache-drill-1.6.0/bin $ ./drillbit.sh start
  • 14. Connect to Drill via JDBC ● JDBC Driver ○ DRILL_HOME/jars/jdbc-driver/drill-jdbc-all-1.6.0.jar ● Class ○ org.apache.drill.jdbc.Driver ● URL ○ jdbc:drill:drillbit=localhost
  • 16. Query nested JSON {"user": {"name": "suzuki", "dept": "sales"}} {"user": {"name": "yamada", "dept": "development"}} {"user": {"name": "sato", "dept": "development"}} ... SELECT T.`user`.name AS name, T.`user`.dept AS dept FROM dfs.`/tmp/users.json` T WHERE T.`user`.name = 'yamada'; Extract JSON property as column
  • 17. Expand nested JSON property to records {"user": { "name": "yamada", "experience": [ {"lang": "Java"}, {"lang": "Scala"} ] }} SELECT T2.name AS name, T2.experience.lang AS lang, FROM ( SELECT T1.`user`.name AS name, FLATTEN(T1.`user`.experience) AS experience FROM dfs.`/tmp/users.json` T1 ) T2 Expand nested array as individual table
  • 18. In the case of jq $ cat users.json | jq '.user | select(.name == "yamada")' Nested JSON in Drill brings complexy. Maybe jq is better for simple query?
  • 20. Action log ● Store action log into the local file as JSON ● We can query them using Drill if necessary
  • 21. Data warehouse ● Aggregate various datasources to Drill ● Data synchronization is no need
  • 22. e.g. Access Elasticsearch through Hive ● elasticsearch-hadoop supports Hive ● Drill supports Hive http://takezoe.hatenablog.com/entry/20150524/p1 Can we access Elasticsearch from Drill?
  • 24. Conclusion Apache Drill is ● good tool for querying various datasets ● easy setup and user friendly ● pre-investment is not required ● useful for small data, not only big data Put Apache Drill into your toolbox!