SlideShare a Scribd company logo
1 of 42
Download to read offline
Best Better practice of Cassandra
Cassandraに不向きなCassandraデータモデリング基礎
Hayato Tsutsumi
Works Applications
Hayato Tsutsumi
堤 勇人
Cassandra experience :
7 years (from ver.0.6)
Certification for Apache Cassandra Administarator
Data size : about 40TB
Nodes:about 40 (would increase soon...)
Twitter : 2t3
Site Reliability Engineering Div.
Works Applications Co., Ltd
自己紹介 Speaker
Target
● Mid-range System
Data size
1TB ~ 1PB
Data amount
10 Mil ~ 100 Bil
+ high speed processing
What is better?
Best practice = Right people, right place
適材適所は確かにベスト
Suitable
Data
But not all data is suitable
じゃあベストじゃない部分は?
Our system
Suitable
Data
Un-suitable data for Cassandra
Use both Cassandra and RDB?
O*
or
M*
Suitable
Data
You may use only RDB...
O*
or
M*
Suitable
Data
Another way : Manage data only with Cassandra
Suitable
Data
3 models unlike NoSQL
Historical data
履歴管理データ
Tree structure
ツリー構造
Summarized data
計上データ
How Cassandra
read data in 3mins
前
提
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB text,
ckA text,
ckB text,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
hash(pkA1
,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1
,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
pkA1 pkB2 ckA3 ckB3 v3 w3
on Table
on Cassandra Cassandra can search Column name
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB int,
ckA int,
ckB int,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
hash(pkA1
,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1
,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
pkA1 pkB2 ckA3 ckB3 v3 w3
on Table
on Cassandra Cassandra can search Column name
Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB int,
ckA int,
ckB int,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
where pkA = "pkA1"; //NG
where pkA = "pkA1" and pkB = "pkB1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"
and ckB = "ckB1"; //OK
where pkA = "pkA1" and ckA = "ckA1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB >= "pkB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"
and ckB >= "ckB1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"
and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"
and ckA < "ckA2"; //OK
Historical data
履歴管理データ
photo by Bryan Wright
(https://secure.flickr.com/photos/spidermandragon5/2922128673/)
社員の異動情報
Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
社員の異動情報
Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
at 3/25
emp001 emp002 emp003
B Div. A Div. E Div.
at 4/25
emp001 emp002 emp003
C Div. D Div. E Div.
emp_history table
CREATE TABLE emp_history (
id text,
no text,
s date,
e date,
div text,
PRIMARY KEY (id, s, e, no)
);
select * from emp_history
where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG
?
emp_history table
CREATE TABLE emp_history (
id text,
no text,
s date,
e date,
div text,
PRIMARY KEY (id, s, no)
);
CREATE CUSTOM INDEX fn_e ON
emp_history (e) USING
'org.apache.cassandra.index.sasi.
SASIIndex';
select * from emp_history
where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK
use custom index
Tree structure
ツリー構造
組織構造
Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list
● Path Enumeration
● Nested set
● Closure table
判断のポイント
Criteria
● No join, recursive query
● Anyway need consistency
● Jaywalk or
denormalization is Natural
● JOIN、再帰問い合わせ
不可
● 整合性はどの道別の方
法で取る必要がある
● ジェイウォーク、非正規化
も当たり前
ツリー構造への要求
Requirement to tree model
● show ancestors
● show children
● show descendants
● show sibilings of a
● あるノードからルートまで
の全ての親を取得
● 子供を1段展開
● 子供を全て展開
● 兄弟を取得
組織構造
Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list
● Path Enumeration
● Nested set
● Closure table
Worth considering!
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
id fqdn child code
test A [a,b] A
test A:a [1,2] a
test A:b [3,4,5] b
test A:a:1 1
test A:a:2 2
test A:b:3 3
test A:b:4 4
test A:b:5 5
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
select * from pathenum
where id = 'test' and fqdn like 'A:'; //NG
It needs 'like' search
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
select * from pathenum
where id = 'test' and fqdn like 'A:'; //NG
select * from pathenum
where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK
: U+003A
; U+003B
It needs 'like' search
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
//show ancestors
fqdn.split(":");
//show children of a
select child from pathenum where id = 'test' and
fqdn = 'A:a';
//show descendants of A
select * from fqdntest where id = 'test' and
fqdn >= 'A:' and fqdn < 'A;';
//show sibilings of a
select p from fqdntest where
id = 'test' and fqdn = 'A';
A a
b
1
2
3
4
5
経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
pros
- one access
cons
- hot spot
- range slice
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
id v
A A Div.
a a Dept.
b b Dept.
1 1 Sec.
2 2 Sec.
3 3 Sec.
4 4 Sec.
5 5 Sec.
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
//show ancestors
select p from closure_path where c = '1';
select * from closure_main where id in [?];
//show children of a
select c from closure_path where p = 'a' and d
= 1;
select * from closure_main where id in [?];
//show descendants of A
select c from closure_path where p = 'A';
select * from closure_main where id in [?];
//show sibilings of a
//load a's parent = A
select * from closure_path where c = 'a';
select c from closure_path where p = 'A' and d
= 1;
select * from closure_main where id in [?];
pros
- Distributed
- get access
cons
- need an index
- 2 ~ 3 times access
- increase data
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
pros
- Distributed
- get access
cons
- need an index
- 2 ~ 3 times access
- increase data
- complex process when update
pros & cons
閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
How increase data?
When assume n-children per
node and d-depth tree,
number of data will be
proportional to d.
Summarized
data
計上データ
伝票集計処理
Aggregation of slips
Dr. Cr.
A 200 B 50
C 150
伝票集計処理
Aggregation of slips
parallel batch processing
aggregation
online streaming
要求水準
Requirements
● miscalculation = critical
● need parallel / streaming
processing
● need high speed
processing
● 誤計算は死
● バッチの並列処理、オン
ラインによるストリーミン
グ処理が必要
● 高速処理が求められる
● miscalculation = critical
● need parallel / streaming
processing
● need high speed
processing
● 誤計算は死
● バッチの並列処理、オン
ラインによるストリーミン
グ処理が必要
● 高速処理が求められる
= Consistency!
要求水準
Requirements
計上データ
Summarized data
CREATE TABLE countup (
id text PRIMARY KEY,
v counter
);
UPDATE countup SET v = v + 1 WHERE id = 'test';
Use Counter...? No.
計上データ
Summarized data
CREATE TABLE countup (
id text PRIMARY KEY,
v int
);
UPDATE countup set v = 101 where id = 'test' if v =
100;
Use update with LWT
What is the best?
Thanks!

More Related Content

Similar to Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point

Building and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CBuilding and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CDavid Wheeler
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Workhorse Computing
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsJan Aerts
 
Advanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfAdvanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfssuser785ce21
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution planspaulguerin
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsWorkhorse Computing
 
An OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parserAn OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parserKiwamu Okabe
 
Drupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary EditionDrupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary Editionddiers
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介Masayuki Matsushita
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scalefabxc
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PgDay.Seoul
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoMark Wong
 
Advanced Perl Techniques
Advanced Perl TechniquesAdvanced Perl Techniques
Advanced Perl TechniquesDave Cross
 
Perforce Object and Record Model
Perforce Object and Record Model  Perforce Object and Record Model
Perforce Object and Record Model Perforce
 
2016年のPerl (Long version)
2016年のPerl (Long version)2016年のPerl (Long version)
2016年のPerl (Long version)charsbar
 
Deeply Declarative Data Pipelines
Deeply Declarative Data PipelinesDeeply Declarative Data Pipelines
Deeply Declarative Data PipelinesHostedbyConfluent
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018artgillespie
 
flowr streamlining computing workflows
flowr streamlining computing workflowsflowr streamlining computing workflows
flowr streamlining computing workflowssahil seth
 

Similar to Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point (20)

Building and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning CBuilding and Distributing PostgreSQL Extensions Without Learning C
Building and Distributing PostgreSQL Extensions Without Learning C
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!Hypers and Gathers and Takes! Oh my!
Hypers and Gathers and Takes! Oh my!
 
Csql Cache Presentation
Csql Cache PresentationCsql Cache Presentation
Csql Cache Presentation
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
Advanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdfAdvanced_Research_Techniques.pdf
Advanced_Research_Techniques.pdf
 
Sydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plansSydney Oracle Meetup - execution plans
Sydney Oracle Meetup - execution plans
 
Perly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data RecordsPerly Parallel Processing of Fixed Width Data Records
Perly Parallel Processing of Fixed Width Data Records
 
An OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parserAn OCaml newbie meets Camlp4 parser
An OCaml newbie meets Camlp4 parser
 
Drupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary EditionDrupal - dbtng 25th Anniversary Edition
Drupal - dbtng 25th Anniversary Edition
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
Storing 16 Bytes at Scale
Storing 16 Bytes at ScaleStoring 16 Bytes at Scale
Storing 16 Bytes at Scale
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 HowtoPostgreSQL Portland Performance Practice Project - Database Test 2 Howto
PostgreSQL Portland Performance Practice Project - Database Test 2 Howto
 
Advanced Perl Techniques
Advanced Perl TechniquesAdvanced Perl Techniques
Advanced Perl Techniques
 
Perforce Object and Record Model
Perforce Object and Record Model  Perforce Object and Record Model
Perforce Object and Record Model
 
2016年のPerl (Long version)
2016年のPerl (Long version)2016年のPerl (Long version)
2016年のPerl (Long version)
 
Deeply Declarative Data Pipelines
Deeply Declarative Data PipelinesDeeply Declarative Data Pipelines
Deeply Declarative Data Pipelines
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018
 
flowr streamlining computing workflows
flowr streamlining computing workflowsflowr streamlining computing workflows
flowr streamlining computing workflows
 

More from Works Applications

Gitで安定マスターブランチを手に入れる
Gitで安定マスターブランチを手に入れるGitで安定マスターブランチを手に入れる
Gitで安定マスターブランチを手に入れるWorks Applications
 
Javaでつくる本格形態素解析器
Javaでつくる本格形態素解析器Javaでつくる本格形態素解析器
Javaでつくる本格形態素解析器Works Applications
 
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫Works Applications
 
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...Works Applications
 
SpotBugs(FindBugs)による 大規模ERPのコード品質改善
SpotBugs(FindBugs)による 大規模ERPのコード品質改善SpotBugs(FindBugs)による 大規模ERPのコード品質改善
SpotBugs(FindBugs)による 大規模ERPのコード品質改善Works Applications
 
Enterprise UI/UX - design as code
Enterprise UI/UX - design as codeEnterprise UI/UX - design as code
Enterprise UI/UX - design as codeWorks Applications
 
Kubernetesにまつわるエトセトラ(主に苦労話)
Kubernetesにまつわるエトセトラ(主に苦労話)Kubernetesにまつわるエトセトラ(主に苦労話)
Kubernetesにまつわるエトセトラ(主に苦労話)Works Applications
 
Global Innovation Nights - Spark
Global Innovation Nights - SparkGlobal Innovation Nights - Spark
Global Innovation Nights - SparkWorks Applications
 

More from Works Applications (11)

Gitで安定マスターブランチを手に入れる
Gitで安定マスターブランチを手に入れるGitで安定マスターブランチを手に入れる
Gitで安定マスターブランチを手に入れる
 
Javaでつくる本格形態素解析器
Javaでつくる本格形態素解析器Javaでつくる本格形態素解析器
Javaでつくる本格形態素解析器
 
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
新入社員が多い中で効果的なレビューを行うための方法 レビューの準備からフィードバックまでの工夫
 
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
 
形態素解析
形態素解析形態素解析
形態素解析
 
Erpと自然言語処理
Erpと自然言語処理Erpと自然言語処理
Erpと自然言語処理
 
SpotBugs(FindBugs)による 大規模ERPのコード品質改善
SpotBugs(FindBugs)による 大規模ERPのコード品質改善SpotBugs(FindBugs)による 大規模ERPのコード品質改善
SpotBugs(FindBugs)による 大規模ERPのコード品質改善
 
Enterprise UI/UX - design as code
Enterprise UI/UX - design as codeEnterprise UI/UX - design as code
Enterprise UI/UX - design as code
 
Kubernetesにまつわるエトセトラ(主に苦労話)
Kubernetesにまつわるエトセトラ(主に苦労話)Kubernetesにまつわるエトセトラ(主に苦労話)
Kubernetesにまつわるエトセトラ(主に苦労話)
 
Demystifying kubernetes
Demystifying kubernetesDemystifying kubernetes
Demystifying kubernetes
 
Global Innovation Nights - Spark
Global Innovation Nights - SparkGlobal Innovation Nights - Spark
Global Innovation Nights - Spark
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point

  • 1. Best Better practice of Cassandra Cassandraに不向きなCassandraデータモデリング基礎 Hayato Tsutsumi Works Applications
  • 2. Hayato Tsutsumi 堤 勇人 Cassandra experience : 7 years (from ver.0.6) Certification for Apache Cassandra Administarator Data size : about 40TB Nodes:about 40 (would increase soon...) Twitter : 2t3 Site Reliability Engineering Div. Works Applications Co., Ltd 自己紹介 Speaker
  • 3. Target ● Mid-range System Data size 1TB ~ 1PB Data amount 10 Mil ~ 100 Bil + high speed processing
  • 5. Best practice = Right people, right place 適材適所は確かにベスト Suitable Data
  • 6. But not all data is suitable じゃあベストじゃない部分は? Our system Suitable Data Un-suitable data for Cassandra
  • 7. Use both Cassandra and RDB? O* or M* Suitable Data
  • 8. You may use only RDB... O* or M* Suitable Data
  • 9. Another way : Manage data only with Cassandra Suitable Data
  • 10. 3 models unlike NoSQL Historical data 履歴管理データ Tree structure ツリー構造 Summarized data 計上データ
  • 11. How Cassandra read data in 3mins 前 提
  • 12. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB text, ckA text, ckB text, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key hash(pkA1 ,pkB1) Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w Value v1 w1 v2 w2 hash(pkA1 ,pkB2) Column ckA3:ckB3:v ckA3:ckB3:w Value v3 w3 Column pkA pkB ckA ckB v w Value pkA1 pkB1 ckA1 ckB1 v1 w1 pkA1 pkB1 ckA1 ckB2 v2 w2 pkA1 pkB2 ckA3 ckB3 v3 w3 on Table on Cassandra Cassandra can search Column name
  • 13. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key hash(pkA1 ,pkB1) Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w Value v1 w1 v2 w2 hash(pkA1 ,pkB2) Column ckA3:ckB3:v ckA3:ckB3:w Value v3 w3 Column pkA pkB ckA ckB v w Value pkA1 pkB1 ckA1 ckB1 v1 w1 pkA1 pkB1 ckA1 ckB2 v2 w2 pkA1 pkB2 ckA3 ckB3 v3 w3 on Table on Cassandra Cassandra can search Column name
  • 14. Partition key & Clustering key CREATE TABLE test_table ( pkA text, pkB int, ckA int, ckB int, v text, w text, PRIMARY KEY ((pkA, pkB), ckA, ckB) ); Partition Key Clustering Key where pkA = "pkA1"; //NG where pkA = "pkA1" and pkB = "pkB1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB = "ckB1"; //OK where pkA = "pkA1" and ckA = "ckA1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG where pkA = "pkA1" and pkB >= "pkB1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1" and ckB >= "ckB1"; //OK where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckB = "ckB1"; //NG where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1" and ckA < "ckA2"; //OK
  • 15. Historical data 履歴管理データ photo by Bryan Wright (https://secure.flickr.com/photos/spidermandragon5/2922128673/)
  • 16. 社員の異動情報 Employee transfer history A Div. B Div. C Div. A Div. C Div. D Div. emp001 emp002 D Div. E Div.emp003 4/1 4/16 5/13/112/1 2/21
  • 17. 社員の異動情報 Employee transfer history A Div. B Div. C Div. A Div. C Div. D Div. emp001 emp002 D Div. E Div.emp003 4/1 4/16 5/13/112/1 2/21 at 3/25 emp001 emp002 emp003 B Div. A Div. E Div. at 4/25 emp001 emp002 emp003 C Div. D Div. E Div.
  • 18. emp_history table CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, e, no) ); select * from emp_history where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG ?
  • 19. emp_history table CREATE TABLE emp_history ( id text, no text, s date, e date, div text, PRIMARY KEY (id, s, no) ); CREATE CUSTOM INDEX fn_e ON emp_history (e) USING 'org.apache.cassandra.index.sasi. SASIIndex'; select * from emp_history where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK use custom index
  • 21. 組織構造 Organization tree A Div. a Dept. b Dept. 1 Sec. 2 Sec. 3 Sec. 4 Sec. 5 Sec. Well known models ● Adjacency list ● Path Enumeration ● Nested set ● Closure table
  • 22. 判断のポイント Criteria ● No join, recursive query ● Anyway need consistency ● Jaywalk or denormalization is Natural ● JOIN、再帰問い合わせ 不可 ● 整合性はどの道別の方 法で取る必要がある ● ジェイウォーク、非正規化 も当たり前
  • 23. ツリー構造への要求 Requirement to tree model ● show ancestors ● show children ● show descendants ● show sibilings of a ● あるノードからルートまで の全ての親を取得 ● 子供を1段展開 ● 子供を全て展開 ● 兄弟を取得
  • 24. 組織構造 Organization tree A Div. a Dept. b Dept. 1 Sec. 2 Sec. 3 Sec. 4 Sec. 5 Sec. Well known models ● Adjacency list ● Path Enumeration ● Nested set ● Closure table Worth considering!
  • 25. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); id fqdn child code test A [a,b] A test A:a [1,2] a test A:b [3,4,5] b test A:a:1 1 test A:a:2 2 test A:b:3 3 test A:b:4 4 test A:b:5 5 A a b 1 2 3 4 5
  • 26. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); select * from pathenum where id = 'test' and fqdn like 'A:'; //NG It needs 'like' search A a b 1 2 3 4 5
  • 27. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); select * from pathenum where id = 'test' and fqdn like 'A:'; //NG select * from pathenum where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK : U+003A ; U+003B It needs 'like' search A a b 1 2 3 4 5
  • 28. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); //show ancestors fqdn.split(":"); //show children of a select child from pathenum where id = 'test' and fqdn = 'A:a'; //show descendants of A select * from fqdntest where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //show sibilings of a select p from fqdntest where id = 'test' and fqdn = 'A'; A a b 1 2 3 4 5
  • 29. 経路列挙 Path enumeration CREATE TABLE pathenum ( id text, fqdn text, child text, code text, PRIMARY KEY (id, fqdn) ); pros - one access cons - hot spot - range slice - complex process when update pros & cons
  • 30. 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); id v A A Div. a a Dept. b b Dept. 1 1 Sec. 2 2 Sec. 3 3 Sec. 4 4 Sec. 5 5 Sec. p c d A A 0 A a 1 A b 1 A 1 2 A 2 2 A 3 2 A 4 2 A 5 2 a a 0 a 1 1 p c d a 2 1 1 1 0 2 2 0 b b 0 b 3 1 b 4 1 b 5 1 3 3 0 4 4 0 5 5 0
  • 31. 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex'; p c d A A 0 A a 1 A b 1 A 1 2 A 2 2 A 3 2 A 4 2 A 5 2 a a 0 a 1 1 p c d a 2 1 1 1 0 2 2 0 b b 0 b 3 1 b 4 1 b 5 1 3 3 0 4 4 0 5 5 0 //show ancestors select p from closure_path where c = '1'; select * from closure_main where id in [?]; //show children of a select c from closure_path where p = 'a' and d = 1; select * from closure_main where id in [?]; //show descendants of A select c from closure_path where p = 'A'; select * from closure_main where id in [?]; //show sibilings of a //load a's parent = A select * from closure_path where c = 'a'; select c from closure_path where p = 'A' and d = 1; select * from closure_main where id in [?];
  • 32. pros - Distributed - get access cons - need an index - 2 ~ 3 times access - increase data - complex process when update pros & cons 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex';
  • 33. pros - Distributed - get access cons - need an index - 2 ~ 3 times access - increase data - complex process when update pros & cons 閉包テーブル Closure table CREATE TABLE closure_main ( id text, v text, PRIMARY KEY (id) ); CREATE TABLE closure_path ( p text, c text, d int, PRIMARY KEY (p, d, c) ); CREATE CUSTOM INDEX fn_c ON test.closure_path (c) USING 'org.apache. cassandra.index.sasi.SASIIndex'; How increase data? When assume n-children per node and d-depth tree, number of data will be proportional to d.
  • 36. 伝票集計処理 Aggregation of slips parallel batch processing aggregation online streaming
  • 37. 要求水準 Requirements ● miscalculation = critical ● need parallel / streaming processing ● need high speed processing ● 誤計算は死 ● バッチの並列処理、オン ラインによるストリーミン グ処理が必要 ● 高速処理が求められる
  • 38. ● miscalculation = critical ● need parallel / streaming processing ● need high speed processing ● 誤計算は死 ● バッチの並列処理、オン ラインによるストリーミン グ処理が必要 ● 高速処理が求められる = Consistency! 要求水準 Requirements
  • 39. 計上データ Summarized data CREATE TABLE countup ( id text PRIMARY KEY, v counter ); UPDATE countup SET v = v + 1 WHERE id = 'test'; Use Counter...? No.
  • 40. 計上データ Summarized data CREATE TABLE countup ( id text PRIMARY KEY, v int ); UPDATE countup set v = 101 where id = 'test' if v = 100; Use update with LWT
  • 41. What is the best?