SlideShare a Scribd company logo
1 of 28
Download to read offline
Rakuten Technology Conference 2017
A Distributed SQL Database
For Data Analysis, Astra Project
2017-10-28
Yosuke Hara (原 陽亮)

Rakuten Institute of Technology

Rakuten, Inc. rev. 1.0.5
Skylab
A Microservices Framework
11 0101
0010111011
110110010011
01110111011001
011101110110010
2
LeoFS
A Distributed Storage
11 0101
0010111011
110110010011
01110111011001
011101110110010
Astra
A Distributed SQL Database
For Data Analytics
11 0101
0010111011
110110010011
01110111011001
011101110110010
R&D Projects
Introducing To Astra
* “Astra” is a code name of a product under development
One of Backgrounds
More “Connected Things” In The World
Consumer Applications to Represent 63% of Total IoT Applications in 2017
IoT Units Installed Base by Category
MillionsofUnits
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
20,000
22,000
2016 2017 2018 2020
1,316.6
1,635.4
2,027.7
3,171
1,102.1
1,501
2,132.6
4,381.4
3,963
5,244.3
7,036.3
12,863
Consumer
Business: Cross-Industry
Business: Vertical-Specific
Source: Gartner (January 2017)
+31%
4
63%
18%
19%
20.4B
8.4B
6.4B
11.2B
Providing A Database That
Anyone Who Can Analyze Data
Initial Concept
6
Provides Components of DataLake as a Service
Data Science
+
DataLake
Data Governance Job Scheduler
+
Distributed
Computing
Data Store
Astra Skylab
Spark, Hadoop
Self-Service
Analytics
11 0101
0010111011
110110010011
01110111011001
011101110110
7
Current Concept
Advanced Data Analysis In Semi-Realtime At Low Cost
Aggregate, and
Analyze Data
Find Insights
Streaming Data
Un/Semi-
Structured Data
1100101
10010111011
110110010011
0110111011001
1101110110
Store Data
Into Astra
Data Intelligence Action
Tools / Apps
Automated
Systems
8
Current Concept: Depends on Single Source Of Truth
Self-Service Analytics
Data Governance
Distributed Computing
For Massive-Parallel
Processing
Distributed Database
For Aggregation and
Analysis
+
Distributed Storage
(DataLake Store)
+
Astra’s Components
1100101
10010111011
110110010011
0110111011001
1101110110
In-place Analysis
Features
Database
SQL Engine
Data Science
Analysis Functions
On The Distributed
Computing
Reliability, Scalability, and
Massive Parallel Processing
Ad-hoc Query
Various Data
Without Limit
Data Store
10
Unified Components
Confirms To ANSI SQL99 Standard
• Communication With Any BI / Data Visualization Tools, and Apps
• Able To Call All Astra’s Functions, UDFs and ML With SQL
The Features - ANSI SQL99 Standard
11
astra:test> SELECT workclass, COUNT(income)
-> AS income_count
-> FROM adult_income
-> WHERE income = '<=50K'
-> GROUP BY workclass
-> ORDER BY workclass;
workclass | income_count
------------------+--------------
? | 2534
Federal-gov | 871
Local-gov | 2209
Never-worked | 10
Private | 26519
Self-emp-inc | 757
Self-emp-not-inc | 2785
State-gov | 1451
Without-pay | 19
(9 rows)
Advanced Data Analytics On The Distributed Computing, Massive-
Parallel Processing
• Built-In Analysis Functions and UDF
• Machine Learning
The Features - Advanced Data Analytics
12
Business
Understanding
Data
Understanding
Data
Preparation
Modeling
Evaluation
Deployment
1100101
10010111011
110110010011
0110111011001
1101110110
Feedback
Able To Repeat
Trial And Error
w/o Limit
The Features - Availability and Scalability
High Availability
• Automated Data Replication And Recovery, and Failover
High Scalability
• An Elastic Cluster - Nodes That Can Flexibly Attach And Detach
13
Worker
Worker
Worker
Worker
Request
Worker
Response
Clients
Coordinator(s)
HTTP
Message with
Gossip Protocol
Monitoring Resources
Scheduling Jobs
* Circuit Breaker: martinfowler.com/bliki/CircuitBreaker.html
Circuit Breaker
Figure: Akka Circuit breaker
Requesting Jobs
Architecture
15
High-level ArchitectureSQLEngine
Workers
Database
Layer
DataStore
Layer
Astra
CLIClients
SQL over ODBC/JDBC
Astra DataStore
AstraSQL
AstraBase
- Original Data
- Semi-Structured Data
- Cold Data
- Columnar Tables
- Metadata Store
- Record Operation
- Record Set Cache (Hot Data)
- Distributed Computing
- Data Analysis
- Data Converter
- Semi-Structured Data To
Columnar Table
Original Data Load
Operate Astra
Multi-Coordinator
LeoFS is a software defined storage (SDS)
for DataLake and Web
LeoFS is an Enterprise Open Source Storage, and it is a highly
available, distributed, eventually consistent object/blob store
Goals:
- High Availability
- High Cost Performance Ratio
- High Scalability
LeoFS For Astra DataStore
16
Astra DataStore (LeoFS)
AstraSQL
AstraCLI
1-1. Put Original Data w/AstraCLI
2. Store the Data and Metadata
4. Request Converting Data Format of a Table
5. Convert Data Format of a Table
and Change Table’s Metadata
Processing Flow - Store a CSV file, Then Query Data
AstraBase 6. Store Converted Data
1-2. Create Metadata
[Store a CSV File]
[Convert Data Format At Async]
[Execute Query]
3. Query Data For Aggregation Or Data Analysis
1-1
1-2
2
3
17
REST-API
gRPCS3-API
gRPC
O/JDBC
AstraBase
Coordinator(s)
AstraBase
Workers
Resource Monitor
+ Scheduler
S3-API
gRPC
gRPC
AstraBase
Coordinator(s)
6
4
5
Astra DataStore (LeoFS)
AstraSQL 3-1. Retrieve Target Records from the Cache
4. Process Data Analysis in Parallel
5. Reply To AstraBase Coordinator,
Then Summarize the Result on the Coordinator
Processing Flow - Query for Advanced Analysis
AstraBase
3-2. Retrieve Target Records From LeoFS
(Cache Miss)
[Retrieve Records]
[Reply]
[Execute Query]
1. Execute SQL For Data Analysis
3-2
1
2-1
2-1. Request Data Analisys to AstraBase
gRPC
18
gRPCO/JDBC
AstraBase
Coordinator(s)
AstraBase
Workers
Resource Monitor
+ Scheduler
S3-API
3-1, 4
AstraBase
Coordinator(s)
5
gRPC
gRPC
2-2
2-2. Request Message to AstraBase’s Workers
Store Files Into Astra
(Original Data,
Semi-Structured Files)
Data Validation
Data Verification
Data Type Inference
Store Chunks and
Metadata
1. Data Load
To Handle Plural Data Formats In A Table
Partition Into Plural
Chunks
CSV / TSV / JSON
To Parquet / CarbonData SerDes
19
Able To Do Self Data
Analytics Even If During
Data Conversion
Data is partitioned by a condition
of a specified column
2. Data Conversion At Async
Data Storage
Supports Data Format and SerDes
- CSV, TSV, and Custom Delimiter Files
- JSON
- RegEx SerDes for Unstructured Data
- Parquet SerDes (A Columnar Storage Format)
- CarbonData SerDes (A Columnar Storage Format)
Supports Compression Methods
- SNAPPY
- ZLIB
- GZIP
- LZO
20
Supports Plural Data Formats And SerDes
Table Schema Parquet Format
CSV Format
An Example of METADATA as JSON
21
Stores Each File
Into Astra Data Store, LeoFS
Data Type
Inference
AstraBase
Coordinator(s)
Astra DataStore (LeoFS)
AstraSQL
AstraBase
3
2, 5
1
22
gRPCO/JDBC
Machine Learning on Astra - Modeling
[Create A Model, Then Store It]
2. Generate Tasks From A Job On A Coordinator
3. Request A Task To Workers
[Request A Modeling]
1. Request A Modeling To An Initiator Of AstraBase
4-1. Execute Function(s)
In Parallel On Each Worker
5. Summarize The Result On A Coordinator
Then Store The Model Into The Cluster To Reuse
4-2
4-2. Load Data From Data Store If Not Exists On Cache
S3-API
AstraBase
Workers
gRPC 4-1
gRPC
Resource Monitor
+ Scheduler
AstraBase
Coordinator(s)
S3-API
Integration With BI Tools
Integration With Tableau (BI Tool)
astra:test> DESCRIBE adult_income
-> ;
Column | Type | Extra | Comment
-----------------+---------+-------+---------
age | integer | |
workclass | varchar | |
fnlwgt | integer | |
education | varchar | |
educational-num | integer | |
marital-status | varchar | |
occupation | varchar | |
relationship | varchar | |
race | varchar | |
gender | varchar | |
capital-gain | integer | |
capital-loss | integer | |
hours-per-week | varchar | |
native-country | varchar | |
income | varchar | |
(15 rows)
astra:test> SELECT workclass, COUNT(income)
-> as income_count
-> FROM adult_income
-> WHERE income = '<=50K'
-> GROUP BY workclass
-> ORDER BY workclass;
workclass | income_count
------------------+--------------
? | 2534
Federal-gov | 871
Local-gov | 2209
Never-worked | 10
Private | 26519
Self-emp-inc | 757
Self-emp-not-inc | 2785
State-gov | 1451
Without-pay | 19
(9 rows)
24
25
Visualizing Data With 3rd Party Tools
Communicates With Visualizing Data And BI Tools
Dundas BI
Qlik Sense
Microsoft PowerBI
Future Plans
Future Plans
By Oct/E, 2017 Nov, 2017 - June/E, 2018 Q3 2018
Alpha 1st Beta
2nd Beta
Publish It
- Alpha
- Un/Semi-Structured Data and Parquet SerDes Support
- BI Tools and Visualization Tools Integration
- 1st Beta, Step-Growth Phase
- Record Set Cache
- Distributed Computing For UDF and ML
- Other SerDes Support
27
THANK YOU

More Related Content

What's hot

Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Databricks
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowDatabricks
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaDatabricks
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn confluent
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Databricks
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Databricks
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...Databricks
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksDatabricks
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudDatabricks
 
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...Spark Summit
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueDatabricks
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...Spark Summit
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 

What's hot (20)

Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Data Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflowData Versioning and Reproducible ML with DVC and MLflow
Data Versioning and Reproducible ML with DVC and MLflow
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog Food
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...
Accelerating Apache Spark-based Analytics on Intel Architecture-(Michael Gree...
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Superworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and FugueSuperworkflow of Graph Neural Networks with K8S and Fugue
Superworkflow of Graph Neural Networks with K8S and Fugue
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 

Viewers also liked

Challenge for statup's cto from big company nagaaki hoshi
Challenge for statup's cto from big company nagaaki hoshiChallenge for statup's cto from big company nagaaki hoshi
Challenge for statup's cto from big company nagaaki hoshiRakuten Group, Inc.
 
WannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup appWannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup appRakuten Group, Inc.
 
Rakuten app productivity initiative for developers marcus saw
Rakuten app productivity initiative for developers marcus sawRakuten app productivity initiative for developers marcus saw
Rakuten app productivity initiative for developers marcus sawRakuten Group, Inc.
 
Life of an enginner in rakuten osaka diarmaid lindsay
Life of an enginner in rakuten osaka diarmaid lindsayLife of an enginner in rakuten osaka diarmaid lindsay
Life of an enginner in rakuten osaka diarmaid lindsayRakuten Group, Inc.
 
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroyaRakuten Group, Inc.
 
はてなのインフラの歴史、そしてMackerelへ至る道とこれから
はてなのインフラの歴史、そしてMackerelへ至る道とこれから はてなのインフラの歴史、そしてMackerelへ至る道とこれから
はてなのインフラの歴史、そしてMackerelへ至る道とこれから Rakuten Group, Inc.
 
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XV
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XVAI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XV
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XVRakuten Group, Inc.
 
トラブルシューティングのあれこれ Yoshihiko kamata
トラブルシューティングのあれこれ Yoshihiko kamataトラブルシューティングのあれこれ Yoshihiko kamata
トラブルシューティングのあれこれ Yoshihiko kamataRakuten Group, Inc.
 
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem  and  TechnologyValue Delivery through RakutenBig Data Intelligence Ecosystem  and  Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem and TechnologyRakuten Group, Inc.
 
What i learned from translation of the sre ryuji tamagawa
What i learned from translation of the sre ryuji tamagawaWhat i learned from translation of the sre ryuji tamagawa
What i learned from translation of the sre ryuji tamagawaRakuten Group, Inc.
 
Rakutenとsreと私 yanagimoto koichi
Rakutenとsreと私 yanagimoto koichiRakutenとsreと私 yanagimoto koichi
Rakutenとsreと私 yanagimoto koichiRakuten Group, Inc.
 
AI based language learning tools
AI based language learning toolsAI based language learning tools
AI based language learning toolsRakuten Group, Inc.
 
Predictions and Hard Problems With AI
Predictions and Hard Problems With AIPredictions and Hard Problems With AI
Predictions and Hard Problems With AIRakuten Group, Inc.
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data PlatformRakuten Group, Inc.
 
Change the engineer life by batch system renewal
Change the engineer life by batch system renewalChange the engineer life by batch system renewal
Change the engineer life by batch system renewalRakuten Group, Inc.
 
Building your own static site Using Hugo
Building your own static site Using HugoBuilding your own static site Using Hugo
Building your own static site Using HugoRakuten Group, Inc.
 

Viewers also liked (20)

One Hundred Languages
One Hundred LanguagesOne Hundred Languages
One Hundred Languages
 
Don't manage too hard!
Don't manage too hard! Don't manage too hard!
Don't manage too hard!
 
Challenge for statup's cto from big company nagaaki hoshi
Challenge for statup's cto from big company nagaaki hoshiChallenge for statup's cto from big company nagaaki hoshi
Challenge for statup's cto from big company nagaaki hoshi
 
WannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup appWannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup app
 
Rakuten app productivity initiative for developers marcus saw
Rakuten app productivity initiative for developers marcus sawRakuten app productivity initiative for developers marcus saw
Rakuten app productivity initiative for developers marcus saw
 
Life of an enginner in rakuten osaka diarmaid lindsay
Life of an enginner in rakuten osaka diarmaid lindsayLife of an enginner in rakuten osaka diarmaid lindsay
Life of an enginner in rakuten osaka diarmaid lindsay
 
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya
時間がないといって、オペレーション改善を怠るな~オペレーション改善奮闘記~ Emi muroya
 
はてなのインフラの歴史、そしてMackerelへ至る道とこれから
はてなのインフラの歴史、そしてMackerelへ至る道とこれから はてなのインフラの歴史、そしてMackerelへ至る道とこれから
はてなのインフラの歴史、そしてMackerelへ至る道とこれから
 
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XV
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XVAI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XV
AI AND FUNDAMENTAL GAME TECHNOLOGIESIN FINAL FANTASY XV
 
トラブルシューティングのあれこれ Yoshihiko kamata
トラブルシューティングのあれこれ Yoshihiko kamataトラブルシューティングのあれこれ Yoshihiko kamata
トラブルシューティングのあれこれ Yoshihiko kamata
 
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem  and  TechnologyValue Delivery through RakutenBig Data Intelligence Ecosystem  and  Technology
Value Delivery through RakutenBig Data Intelligence Ecosystem and Technology
 
What i learned from translation of the sre ryuji tamagawa
What i learned from translation of the sre ryuji tamagawaWhat i learned from translation of the sre ryuji tamagawa
What i learned from translation of the sre ryuji tamagawa
 
Rakutenとsreと私 yanagimoto koichi
Rakutenとsreと私 yanagimoto koichiRakutenとsreと私 yanagimoto koichi
Rakutenとsreと私 yanagimoto koichi
 
AI based language learning tools
AI based language learning toolsAI based language learning tools
AI based language learning tools
 
Predictions and Hard Problems With AI
Predictions and Hard Problems With AIPredictions and Hard Problems With AI
Predictions and Hard Problems With AI
 
Human-Centric Machine Learning
Human-Centric Machine LearningHuman-Centric Machine Learning
Human-Centric Machine Learning
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
 
Change the engineer life by batch system renewal
Change the engineer life by batch system renewalChange the engineer life by batch system renewal
Change the engineer life by batch system renewal
 
Realizing AI Conversational Bot
Realizing AI Conversational BotRealizing AI Conversational Bot
Realizing AI Conversational Bot
 
Building your own static site Using Hugo
Building your own static site Using HugoBuilding your own static site Using Hugo
Building your own static site Using Hugo
 

Similar to Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analysis, Astra Project

Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerSambit Banerjee
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriChetan Khatri
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
 
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql micStreaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql micBas van Oudenaarde
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsSingleStore
 
Cisco UCS with NetApp Storage for SAP HANA Solution
Cisco UCS with NetApp Storage for SAP HANA Solution Cisco UCS with NetApp Storage for SAP HANA Solution
Cisco UCS with NetApp Storage for SAP HANA Solution NetApp
 
DIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL LeeDIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL LeeMyNOG
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Zhenxiao Luo
 
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...Nelson Calero
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger AnalyticsItzhak Kameli
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
[PLCUG] Splunk - complete Citrix environment monitoring
[PLCUG] Splunk - complete Citrix environment monitoring[PLCUG] Splunk - complete Citrix environment monitoring
[PLCUG] Splunk - complete Citrix environment monitoringJaroslaw Sobel
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsVMware Tanzu
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongCeph Community
 

Similar to Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analysis, Astra Project (20)

Neo4j Vision and Roadmap
Neo4j Vision and Roadmap Neo4j Vision and Roadmap
Neo4j Vision and Roadmap
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql micStreaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
 
Sql 2017 net raf
Sql 2017  net rafSql 2017  net raf
Sql 2017 net raf
 
Sql 2016 2017 full
Sql 2016   2017 fullSql 2016   2017 full
Sql 2016 2017 full
 
Cisco UCS with NetApp Storage for SAP HANA Solution
Cisco UCS with NetApp Storage for SAP HANA Solution Cisco UCS with NetApp Storage for SAP HANA Solution
Cisco UCS with NetApp Storage for SAP HANA Solution
 
DIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL LeeDIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL Lee
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...
Evolution of Performance Management: Oracle 12c adaptive optimizations - ukou...
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
[PLCUG] Splunk - complete Citrix environment monitoring
[PLCUG] Splunk - complete Citrix environment monitoring[PLCUG] Splunk - complete Citrix environment monitoring
[PLCUG] Splunk - complete Citrix environment monitoring
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 

More from Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話Rakuten Group, Inc.
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のりRakuten Group, Inc.
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Rakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みRakuten Group, Inc.
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開Rakuten Group, Inc.
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用Rakuten Group, Inc.
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャーRakuten Group, Inc.
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割Rakuten Group, Inc.
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Group, Inc.
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfRakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfRakuten Group, Inc.
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfRakuten Group, Inc.
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technologyRakuten Group, Inc.
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情Rakuten Group, Inc.
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャーRakuten Group, Inc.
 

More from Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 

Recently uploaded

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analysis, Astra Project

  • 1. Rakuten Technology Conference 2017 A Distributed SQL Database For Data Analysis, Astra Project 2017-10-28 Yosuke Hara (原 陽亮)
 Rakuten Institute of Technology
 Rakuten, Inc. rev. 1.0.5
  • 2. Skylab A Microservices Framework 11 0101 0010111011 110110010011 01110111011001 011101110110010 2 LeoFS A Distributed Storage 11 0101 0010111011 110110010011 01110111011001 011101110110010 Astra A Distributed SQL Database For Data Analytics 11 0101 0010111011 110110010011 01110111011001 011101110110010 R&D Projects
  • 3. Introducing To Astra * “Astra” is a code name of a product under development
  • 4. One of Backgrounds More “Connected Things” In The World Consumer Applications to Represent 63% of Total IoT Applications in 2017 IoT Units Installed Base by Category MillionsofUnits 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 22,000 2016 2017 2018 2020 1,316.6 1,635.4 2,027.7 3,171 1,102.1 1,501 2,132.6 4,381.4 3,963 5,244.3 7,036.3 12,863 Consumer Business: Cross-Industry Business: Vertical-Specific Source: Gartner (January 2017) +31% 4 63% 18% 19% 20.4B 8.4B 6.4B 11.2B
  • 5. Providing A Database That Anyone Who Can Analyze Data
  • 6. Initial Concept 6 Provides Components of DataLake as a Service Data Science + DataLake Data Governance Job Scheduler + Distributed Computing Data Store Astra Skylab Spark, Hadoop Self-Service Analytics 11 0101 0010111011 110110010011 01110111011001 011101110110
  • 7. 7 Current Concept Advanced Data Analysis In Semi-Realtime At Low Cost Aggregate, and Analyze Data Find Insights Streaming Data Un/Semi- Structured Data 1100101 10010111011 110110010011 0110111011001 1101110110 Store Data Into Astra Data Intelligence Action Tools / Apps Automated Systems
  • 8. 8 Current Concept: Depends on Single Source Of Truth Self-Service Analytics Data Governance Distributed Computing For Massive-Parallel Processing Distributed Database For Aggregation and Analysis + Distributed Storage (DataLake Store) + Astra’s Components 1100101 10010111011 110110010011 0110111011001 1101110110 In-place Analysis
  • 10. Database SQL Engine Data Science Analysis Functions On The Distributed Computing Reliability, Scalability, and Massive Parallel Processing Ad-hoc Query Various Data Without Limit Data Store 10 Unified Components
  • 11. Confirms To ANSI SQL99 Standard • Communication With Any BI / Data Visualization Tools, and Apps • Able To Call All Astra’s Functions, UDFs and ML With SQL The Features - ANSI SQL99 Standard 11 astra:test> SELECT workclass, COUNT(income) -> AS income_count -> FROM adult_income -> WHERE income = '<=50K' -> GROUP BY workclass -> ORDER BY workclass; workclass | income_count ------------------+-------------- ? | 2534 Federal-gov | 871 Local-gov | 2209 Never-worked | 10 Private | 26519 Self-emp-inc | 757 Self-emp-not-inc | 2785 State-gov | 1451 Without-pay | 19 (9 rows)
  • 12. Advanced Data Analytics On The Distributed Computing, Massive- Parallel Processing • Built-In Analysis Functions and UDF • Machine Learning The Features - Advanced Data Analytics 12 Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment 1100101 10010111011 110110010011 0110111011001 1101110110 Feedback Able To Repeat Trial And Error w/o Limit
  • 13. The Features - Availability and Scalability High Availability • Automated Data Replication And Recovery, and Failover High Scalability • An Elastic Cluster - Nodes That Can Flexibly Attach And Detach 13 Worker Worker Worker Worker Request Worker Response Clients Coordinator(s) HTTP Message with Gossip Protocol Monitoring Resources Scheduling Jobs * Circuit Breaker: martinfowler.com/bliki/CircuitBreaker.html Circuit Breaker Figure: Akka Circuit breaker Requesting Jobs
  • 15. 15 High-level ArchitectureSQLEngine Workers Database Layer DataStore Layer Astra CLIClients SQL over ODBC/JDBC Astra DataStore AstraSQL AstraBase - Original Data - Semi-Structured Data - Cold Data - Columnar Tables - Metadata Store - Record Operation - Record Set Cache (Hot Data) - Distributed Computing - Data Analysis - Data Converter - Semi-Structured Data To Columnar Table Original Data Load Operate Astra Multi-Coordinator
  • 16. LeoFS is a software defined storage (SDS) for DataLake and Web LeoFS is an Enterprise Open Source Storage, and it is a highly available, distributed, eventually consistent object/blob store Goals: - High Availability - High Cost Performance Ratio - High Scalability LeoFS For Astra DataStore 16
  • 17. Astra DataStore (LeoFS) AstraSQL AstraCLI 1-1. Put Original Data w/AstraCLI 2. Store the Data and Metadata 4. Request Converting Data Format of a Table 5. Convert Data Format of a Table and Change Table’s Metadata Processing Flow - Store a CSV file, Then Query Data AstraBase 6. Store Converted Data 1-2. Create Metadata [Store a CSV File] [Convert Data Format At Async] [Execute Query] 3. Query Data For Aggregation Or Data Analysis 1-1 1-2 2 3 17 REST-API gRPCS3-API gRPC O/JDBC AstraBase Coordinator(s) AstraBase Workers Resource Monitor + Scheduler S3-API gRPC gRPC AstraBase Coordinator(s) 6 4 5
  • 18. Astra DataStore (LeoFS) AstraSQL 3-1. Retrieve Target Records from the Cache 4. Process Data Analysis in Parallel 5. Reply To AstraBase Coordinator, Then Summarize the Result on the Coordinator Processing Flow - Query for Advanced Analysis AstraBase 3-2. Retrieve Target Records From LeoFS (Cache Miss) [Retrieve Records] [Reply] [Execute Query] 1. Execute SQL For Data Analysis 3-2 1 2-1 2-1. Request Data Analisys to AstraBase gRPC 18 gRPCO/JDBC AstraBase Coordinator(s) AstraBase Workers Resource Monitor + Scheduler S3-API 3-1, 4 AstraBase Coordinator(s) 5 gRPC gRPC 2-2 2-2. Request Message to AstraBase’s Workers
  • 19. Store Files Into Astra (Original Data, Semi-Structured Files) Data Validation Data Verification Data Type Inference Store Chunks and Metadata 1. Data Load To Handle Plural Data Formats In A Table Partition Into Plural Chunks CSV / TSV / JSON To Parquet / CarbonData SerDes 19 Able To Do Self Data Analytics Even If During Data Conversion Data is partitioned by a condition of a specified column 2. Data Conversion At Async
  • 20. Data Storage Supports Data Format and SerDes - CSV, TSV, and Custom Delimiter Files - JSON - RegEx SerDes for Unstructured Data - Parquet SerDes (A Columnar Storage Format) - CarbonData SerDes (A Columnar Storage Format) Supports Compression Methods - SNAPPY - ZLIB - GZIP - LZO 20 Supports Plural Data Formats And SerDes
  • 21. Table Schema Parquet Format CSV Format An Example of METADATA as JSON 21 Stores Each File Into Astra Data Store, LeoFS Data Type Inference
  • 22. AstraBase Coordinator(s) Astra DataStore (LeoFS) AstraSQL AstraBase 3 2, 5 1 22 gRPCO/JDBC Machine Learning on Astra - Modeling [Create A Model, Then Store It] 2. Generate Tasks From A Job On A Coordinator 3. Request A Task To Workers [Request A Modeling] 1. Request A Modeling To An Initiator Of AstraBase 4-1. Execute Function(s) In Parallel On Each Worker 5. Summarize The Result On A Coordinator Then Store The Model Into The Cluster To Reuse 4-2 4-2. Load Data From Data Store If Not Exists On Cache S3-API AstraBase Workers gRPC 4-1 gRPC Resource Monitor + Scheduler AstraBase Coordinator(s) S3-API
  • 24. Integration With Tableau (BI Tool) astra:test> DESCRIBE adult_income -> ; Column | Type | Extra | Comment -----------------+---------+-------+--------- age | integer | | workclass | varchar | | fnlwgt | integer | | education | varchar | | educational-num | integer | | marital-status | varchar | | occupation | varchar | | relationship | varchar | | race | varchar | | gender | varchar | | capital-gain | integer | | capital-loss | integer | | hours-per-week | varchar | | native-country | varchar | | income | varchar | | (15 rows) astra:test> SELECT workclass, COUNT(income) -> as income_count -> FROM adult_income -> WHERE income = '<=50K' -> GROUP BY workclass -> ORDER BY workclass; workclass | income_count ------------------+-------------- ? | 2534 Federal-gov | 871 Local-gov | 2209 Never-worked | 10 Private | 26519 Self-emp-inc | 757 Self-emp-not-inc | 2785 State-gov | 1451 Without-pay | 19 (9 rows) 24
  • 25. 25 Visualizing Data With 3rd Party Tools Communicates With Visualizing Data And BI Tools Dundas BI Qlik Sense Microsoft PowerBI
  • 27. Future Plans By Oct/E, 2017 Nov, 2017 - June/E, 2018 Q3 2018 Alpha 1st Beta 2nd Beta Publish It - Alpha - Un/Semi-Structured Data and Parquet SerDes Support - BI Tools and Visualization Tools Integration - 1st Beta, Step-Growth Phase - Record Set Cache - Distributed Computing For UDF and ML - Other SerDes Support 27