SlideShare a Scribd company logo
1 of 20
Download to read offline
11
Sheepdog Status Report
Sheepdog Summit 2015
Liu Yuan
22
Agenda
Introduction - Sheepdog Overview
Past and Now - Sheepdog Community
Working In Progress – Problems and Solutions
33
Sheepdog Overview
Introduction
44
• Distributed Object Storage System In User Space
– Manage Disks and Nodes
• Aggregate the capacity and the power (IOPS + throughput)
• Hide the failure of hardware
• Dynamically grow or shrink the scale
– Secure Data
• Provide redundancy mechanisms (replication and erasure code) for high-
availability
• Secure the data with auto-healing and auto-rebalanced mechanisms
– Provide Interfaces (in a single cluster)
• Virtual volume for QEMU VM, iSCSI TGT (Best supported)
• RESTful container (Openstack Swift and Amazon S3 Compatible, in progress)
• Storage for Openstack Cinder, Glance, Nova (in progress)
• POSIX file via NFS (in progress)
• Linux Block Device
What is Sheepdog
55
Gateway
Store
1TB 1TB
1TB
Gateway
Store
1TB 1TB
2TB
Gateway
Store
1TB 2TB
X
Private Hash Ring: Local Rebalance
Global Consistent Hash Ring and P2P Global Rebalance
No meta servers!Zookeeper: membership management and message queue
4TB Hot-plugged Auto unplugged on EIO
Disks and Nodes Management
66
Data Management
Sheep Sheep Sheep
Full Replication
Sheep Sheep Sheep Sheep Sheep Sheep
Erasure Coding
Parity
77
Sheep Sheep Sheep Sheep
Object LUN
Volume
File
Openstack
NFS HTTP iSCSI
Glance
Nova
Cinder
Block
SBD
Interfaces
QEMU
Sheepdog
88
Use Patterns
SD VM SD VM
SD VM SD VM
VM running inside
Sheepdog Cluster
SD SD
SD SD
SD
SD
HTTP
HTTP object storage
SD SD
SD SD
SD
SD
LUN device pool
iSCSI backend
Nginx
99
Sheepdog Community
Past and Now
1010
Peoples
Kazutaka Morita 2009.9
People from Taobao 2011.9
Christph Hellwig from Nebula 2012.4
More production uses from the world
People from Intel 2014
People from China Mobile 2015
Stayed for around half the year
Valerio, Andy, startups at China and Japan
Add isa-l for Erasure code
Open sourced the Sheepdog
Add features, bug fixing, redesign
Make sheepdog better
1111
Patches
2009 2010 2011 2012 2013 2014 2015
0
200
400
600
800
1000
1200
Patches Per Year
●
Culminate at 2012 and 2013,
suffer a decline recently.
●
It is always easier to open
source the code, but build a
community is really difficult.
●
China Mobile is committed to
release all its patches to the
community.
1212
Comparison with Ceph and GlusterFS
Pros:
The simplicity is the biggest advantage for Sheepdog
Sheepdog: 20k+ lines in user space
Ceph: 400k+ lines in user space and 20k+ in kernel
GlusterFS: 330K+ lines in user space
Cons:
●
No company behind
●
inactive community
●
few users and few developers
But Sheepdog is not technically inferior! Simplicity doesn't mean bad!
1313
Sheepdog-ng
Why?
We forked it at May because of endless crashes, panics by our stressing test. I
discussed with NTT guys with the redesign idea to remove shared states between
sheep nodes. They asked me to fork Sheepdog instead simply because they don't use
zookeeper as they always replied to a user with some features they don't use (e.g.,
object cache)
http://lists.wpkg.org/pipermail/sheepdog/2015-May/067736.html
The technical reason:
Share nothing or share more and more state with overwhelming complexity.
The non-technical reason:
Community is not as friendly and open as before. We want to build a real community-
based project.
Subscribe the list: send email to sheepdog-ng+subscribe@googlegroups.com
1414
Problems and Solutions
Working In
Progress
1515
iSCSI Target Scalability
LUN1 LUN2
STGT
sheep
Main thread
Max req == nr of workers
Sync
LUN1 LUN2
New Target
sheep
Unlimted!
Async
Thread per LUN
Problems:
●
OS tends to issue more and
more request (blk-mp, scsi-mp)
●
A single LUN can saturate stgt,
not scale at all
●
STGT take too much resource
●
Multipath is not so good
Solution – Rewrite
●
from sync to async, less threads
and Fds
●
Tailored for sheepdog
●
Add io rebalance and cache
support New target
1616
Performance Degradation
X
IO hang
IO Resume
Problem with default Dynamic Hash Ring
●
If object is in recovery, we need to wait!
●
What make it worse , recovery IO will
complete with user IO for bandwidth, CPU
●
Neither slow nor fast recovery is satisfied
Solution – Static Hash Ring
Failure of node won't change the hash ring.Trade
data reliability for performance! We don't recover
object if some of redundancy data are missing.
Useful for small cluster with mostly deal with single
node event.
X
Drop this IOSHR
DHR
1717
Live Patching
A ----> B ----> C
A B C
B`
After Patching
B` is loaded by Linux's
dynamic loader on the fly
Sheep tracer
Similar to Linux's Ftrace, virtually add
constructor and destructor to every function.
This mechanism relies on the 5 bytes space
(A.K.A mcount) injected by GCC beforehand.
Based on the tracer, we can replace any function
in the sheep daemon on the fly.
Useful for one-liner bug fixing but is limited on
function level.
1818
NFS Server
Current status:
Just a toy with file size < 4M, NFSv3 is not fully supported and virtually no file
system code (need implement inode, dentry and free space management)
Todos
- finish stubs
- add extent to file allocation
- add btree or hash based kv store to manage dentries
- implement a multi-threaded SUNRPC to take place of poor performance
glibc RPC
- implement NFS v4
1919
Cinder - Block Storage
– Support since day 1
Glance - Image Storage
– Support merged at Havana version
Nova - Ephemeral Storage
– Not yet started
Swift - Object Storage
– Swift API compatible In progress
Final Goal - Unified Storage
– Copy-On-Write anywhere ?
– Data dedup ?
Sheep Sheep Sheep Sheep
Cinder Glance
Unified Storage
NovaSwift
Openstack
Plan to rewrite the driver with libsheepdog.so
2020
Enjoy yourself in Suzhou

More Related Content

What's hot

Ceph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To EnterpriseCeph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To EnterpriseAlex Lau
 
Compute 101 - OpenStack Summit Vancouver 2015
Compute 101 - OpenStack Summit Vancouver 2015Compute 101 - OpenStack Summit Vancouver 2015
Compute 101 - OpenStack Summit Vancouver 2015Stephen Gordon
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Kubernetes networking
Kubernetes networkingKubernetes networking
Kubernetes networkingSim Janghoon
 
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaOpenNebula Project
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephDanielle Womboldt
 
Ceph Day Tokyo - Bring Ceph to Enterprise
Ceph Day Tokyo - Bring Ceph to Enterprise Ceph Day Tokyo - Bring Ceph to Enterprise
Ceph Day Tokyo - Bring Ceph to Enterprise Ceph Community
 
Disaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBDDisaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBDViswesuwara Nathan
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebula Project
 
Cloud foundry on kubernetes
Cloud foundry on kubernetesCloud foundry on kubernetes
Cloud foundry on kubernetes상준 윤
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
Performance analysis with_ceph
Performance analysis with_cephPerformance analysis with_ceph
Performance analysis with_cephAlex Lau
 
Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )  Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH ) Alex Lau
 
Introduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDIntroduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDPGConf APAC
 
Managing ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_CinderManaging ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_CinderMaor Lipchuk
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntuSim Janghoon
 
SUSE Enterprise Storage on ThunderX
SUSE Enterprise Storage on ThunderXSUSE Enterprise Storage on ThunderX
SUSE Enterprise Storage on ThunderXAlex Lau
 
Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud StroageAlex Lau
 

What's hot (20)

Ceph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To EnterpriseCeph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To Enterprise
 
Compute 101 - OpenStack Summit Vancouver 2015
Compute 101 - OpenStack Summit Vancouver 2015Compute 101 - OpenStack Summit Vancouver 2015
Compute 101 - OpenStack Summit Vancouver 2015
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Kubernetes networking
Kubernetes networkingKubernetes networking
Kubernetes networking
 
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
Ceph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for CephCeph Day Beijing - SPDK for Ceph
Ceph Day Beijing - SPDK for Ceph
 
Ceph Day Tokyo - Bring Ceph to Enterprise
Ceph Day Tokyo - Bring Ceph to Enterprise Ceph Day Tokyo - Bring Ceph to Enterprise
Ceph Day Tokyo - Bring Ceph to Enterprise
 
Disaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBDDisaster recovery of OpenStack Cinder using DRBD
Disaster recovery of OpenStack Cinder using DRBD
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
 
Cloud foundry on kubernetes
Cloud foundry on kubernetesCloud foundry on kubernetes
Cloud foundry on kubernetes
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Performance analysis with_ceph
Performance analysis with_cephPerformance analysis with_ceph
Performance analysis with_ceph
 
Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )  Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )
 
Introduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XIDIntroduction to Vacuum Freezing and XID
Introduction to Vacuum Freezing and XID
 
Managing ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_CinderManaging ceph through_oVirt_using_Cinder
Managing ceph through_oVirt_using_Cinder
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 
SUSE Enterprise Storage on ThunderX
SUSE Enterprise Storage on ThunderXSUSE Enterprise Storage on ThunderX
SUSE Enterprise Storage on ThunderX
 
Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud Stroage
 

Viewers also liked

Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)Masahiro Tsuji
 
Sheepdog内部实现机制
Sheepdog内部实现机制Sheepdog内部实现机制
Sheepdog内部实现机制Liu Yuan
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterRed_Hat_Storage
 
分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報Emma Haruka Iwao
 
Qemu & KVM Guide #1 (intro & basic)
Qemu & KVM Guide #1 (intro & basic)Qemu & KVM Guide #1 (intro & basic)
Qemu & KVM Guide #1 (intro & basic)JungIn Jung
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
 
Ceph アーキテクチャ概説
Ceph アーキテクチャ概説Ceph アーキテクチャ概説
Ceph アーキテクチャ概説Emma Haruka Iwao
 
[db tech showcase Tokyo 2016] D15: データベース フラッシュソリューション徹底解説! 安価にデータベースを高速にする方法...
[db tech showcase Tokyo 2016] D15: データベース フラッシュソリューション徹底解説! 安価にデータベースを高速にする方法...[db tech showcase Tokyo 2016] D15: データベース フラッシュソリューション徹底解説! 安価にデータベースを高速にする方法...
[db tech showcase Tokyo 2016] D15: データベース フラッシュソリューション徹底解説! 安価にデータベースを高速にする方法...Insight Technology, Inc.
 
[db tech showcase Tokyo 2016] A12: フラッシュストレージのその先へ ~不揮発性メモリNVDIMMが拓くデータベースの世界...
[db tech showcase Tokyo 2016] A12: フラッシュストレージのその先へ ~不揮発性メモリNVDIMMが拓くデータベースの世界...[db tech showcase Tokyo 2016] A12: フラッシュストレージのその先へ ~不揮発性メモリNVDIMMが拓くデータベースの世界...
[db tech showcase Tokyo 2016] A12: フラッシュストレージのその先へ ~不揮発性メモリNVDIMMが拓くデータベースの世界...Insight Technology, Inc.
 

Viewers also liked (9)

Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
Sheepdogを使ってみて分かったこと(第六回ストレージ研究会発表資料)
 
Sheepdog内部实现机制
Sheepdog内部实现机制Sheepdog内部实现机制
Sheepdog内部实现机制
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報分散ストレージ技術Cephの最新情報
分散ストレージ技術Cephの最新情報
 
Qemu & KVM Guide #1 (intro & basic)
Qemu & KVM Guide #1 (intro & basic)Qemu & KVM Guide #1 (intro & basic)
Qemu & KVM Guide #1 (intro & basic)
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
Ceph アーキテクチャ概説
Ceph アーキテクチャ概説Ceph アーキテクチャ概説
Ceph アーキテクチャ概説
 
[db tech showcase Tokyo 2016] D15: データベース フラッシュソリューション徹底解説! 安価にデータベースを高速にする方法...
[db tech showcase Tokyo 2016] D15: データベース フラッシュソリューション徹底解説! 安価にデータベースを高速にする方法...[db tech showcase Tokyo 2016] D15: データベース フラッシュソリューション徹底解説! 安価にデータベースを高速にする方法...
[db tech showcase Tokyo 2016] D15: データベース フラッシュソリューション徹底解説! 安価にデータベースを高速にする方法...
 
[db tech showcase Tokyo 2016] A12: フラッシュストレージのその先へ ~不揮発性メモリNVDIMMが拓くデータベースの世界...
[db tech showcase Tokyo 2016] A12: フラッシュストレージのその先へ ~不揮発性メモリNVDIMMが拓くデータベースの世界...[db tech showcase Tokyo 2016] A12: フラッシュストレージのその先へ ~不揮発性メモリNVDIMMが拓くデータベースの世界...
[db tech showcase Tokyo 2016] A12: フラッシュストレージのその先へ ~不揮発性メモリNVDIMMが拓くデータベースの世界...
 

Similar to Sheepdog Status Report

What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0ScyllaDB
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureScyllaDB
 
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksKernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksAnne Nicolas
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2Yutaka Kawai
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 

Similar to Sheepdog Status Report (20)

GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
10Gbps transfers
10Gbps transfers10Gbps transfers
10Gbps transfers
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database Architecture
 
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksKernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 

Recently uploaded

SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....santhyamuthu1
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS Bahzad5
 
Mohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxMohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxKISHAN KUMAR
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationMohsinKhanA
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecTrupti Shiralkar, CISSP
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxHome
 
Design of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptxDesign of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptxYogeshKumarKJMIT
 
The relationship between iot and communication technology
The relationship between iot and communication technologyThe relationship between iot and communication technology
The relationship between iot and communication technologyabdulkadirmukarram03
 
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Amil baba
 
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid BodyAhmadHajasad2
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical SensorTanvir Moin
 
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratoryدليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide LaboratoryBahzad5
 
EPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxEPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxJoseeMusabyimana
 
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxUNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxrealme6igamerr
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabusViolet Violet
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesDIPIKA83
 
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...Amil baba
 
Power System electrical and electronics .pptx
Power System electrical and electronics .pptxPower System electrical and electronics .pptx
Power System electrical and electronics .pptxMUKULKUMAR210
 

Recently uploaded (20)

SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
 
Mohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxMohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptx
 
Lecture 2 .pptx
Lecture 2                            .pptxLecture 2                            .pptx
Lecture 2 .pptx
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software Simulation
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptx
 
Design of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptxDesign of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptx
 
The relationship between iot and communication technology
The relationship between iot and communication technologyThe relationship between iot and communication technology
The relationship between iot and communication technology
 
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
Popular-NO1 Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialis...
 
計劃趕得上變化
計劃趕得上變化計劃趕得上變化
計劃趕得上變化
 
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
Engineering Mechanics  Chapter 5  Equilibrium of a Rigid BodyEngineering Mechanics  Chapter 5  Equilibrium of a Rigid Body
Engineering Mechanics Chapter 5 Equilibrium of a Rigid Body
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical Sensor
 
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratoryدليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
دليل تجارب الاسفلت المختبرية - Asphalt Experiments Guide Laboratory
 
EPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxEPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptx
 
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptxUNIT4_ESD_wfffffggggggggggggith_ARM.pptx
UNIT4_ESD_wfffffggggggggggggith_ARM.pptx
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabus
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display Devices
 
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...
Best-NO1 Best Rohani Amil In Lahore Kala Ilam In Lahore Kala Jadu Amil In Lah...
 
Power System electrical and electronics .pptx
Power System electrical and electronics .pptxPower System electrical and electronics .pptx
Power System electrical and electronics .pptx
 

Sheepdog Status Report

  • 1. 11 Sheepdog Status Report Sheepdog Summit 2015 Liu Yuan
  • 2. 22 Agenda Introduction - Sheepdog Overview Past and Now - Sheepdog Community Working In Progress – Problems and Solutions
  • 4. 44 • Distributed Object Storage System In User Space – Manage Disks and Nodes • Aggregate the capacity and the power (IOPS + throughput) • Hide the failure of hardware • Dynamically grow or shrink the scale – Secure Data • Provide redundancy mechanisms (replication and erasure code) for high- availability • Secure the data with auto-healing and auto-rebalanced mechanisms – Provide Interfaces (in a single cluster) • Virtual volume for QEMU VM, iSCSI TGT (Best supported) • RESTful container (Openstack Swift and Amazon S3 Compatible, in progress) • Storage for Openstack Cinder, Glance, Nova (in progress) • POSIX file via NFS (in progress) • Linux Block Device What is Sheepdog
  • 5. 55 Gateway Store 1TB 1TB 1TB Gateway Store 1TB 1TB 2TB Gateway Store 1TB 2TB X Private Hash Ring: Local Rebalance Global Consistent Hash Ring and P2P Global Rebalance No meta servers!Zookeeper: membership management and message queue 4TB Hot-plugged Auto unplugged on EIO Disks and Nodes Management
  • 6. 66 Data Management Sheep Sheep Sheep Full Replication Sheep Sheep Sheep Sheep Sheep Sheep Erasure Coding Parity
  • 7. 77 Sheep Sheep Sheep Sheep Object LUN Volume File Openstack NFS HTTP iSCSI Glance Nova Cinder Block SBD Interfaces QEMU Sheepdog
  • 8. 88 Use Patterns SD VM SD VM SD VM SD VM VM running inside Sheepdog Cluster SD SD SD SD SD SD HTTP HTTP object storage SD SD SD SD SD SD LUN device pool iSCSI backend Nginx
  • 10. 1010 Peoples Kazutaka Morita 2009.9 People from Taobao 2011.9 Christph Hellwig from Nebula 2012.4 More production uses from the world People from Intel 2014 People from China Mobile 2015 Stayed for around half the year Valerio, Andy, startups at China and Japan Add isa-l for Erasure code Open sourced the Sheepdog Add features, bug fixing, redesign Make sheepdog better
  • 11. 1111 Patches 2009 2010 2011 2012 2013 2014 2015 0 200 400 600 800 1000 1200 Patches Per Year ● Culminate at 2012 and 2013, suffer a decline recently. ● It is always easier to open source the code, but build a community is really difficult. ● China Mobile is committed to release all its patches to the community.
  • 12. 1212 Comparison with Ceph and GlusterFS Pros: The simplicity is the biggest advantage for Sheepdog Sheepdog: 20k+ lines in user space Ceph: 400k+ lines in user space and 20k+ in kernel GlusterFS: 330K+ lines in user space Cons: ● No company behind ● inactive community ● few users and few developers But Sheepdog is not technically inferior! Simplicity doesn't mean bad!
  • 13. 1313 Sheepdog-ng Why? We forked it at May because of endless crashes, panics by our stressing test. I discussed with NTT guys with the redesign idea to remove shared states between sheep nodes. They asked me to fork Sheepdog instead simply because they don't use zookeeper as they always replied to a user with some features they don't use (e.g., object cache) http://lists.wpkg.org/pipermail/sheepdog/2015-May/067736.html The technical reason: Share nothing or share more and more state with overwhelming complexity. The non-technical reason: Community is not as friendly and open as before. We want to build a real community- based project. Subscribe the list: send email to sheepdog-ng+subscribe@googlegroups.com
  • 15. 1515 iSCSI Target Scalability LUN1 LUN2 STGT sheep Main thread Max req == nr of workers Sync LUN1 LUN2 New Target sheep Unlimted! Async Thread per LUN Problems: ● OS tends to issue more and more request (blk-mp, scsi-mp) ● A single LUN can saturate stgt, not scale at all ● STGT take too much resource ● Multipath is not so good Solution – Rewrite ● from sync to async, less threads and Fds ● Tailored for sheepdog ● Add io rebalance and cache support New target
  • 16. 1616 Performance Degradation X IO hang IO Resume Problem with default Dynamic Hash Ring ● If object is in recovery, we need to wait! ● What make it worse , recovery IO will complete with user IO for bandwidth, CPU ● Neither slow nor fast recovery is satisfied Solution – Static Hash Ring Failure of node won't change the hash ring.Trade data reliability for performance! We don't recover object if some of redundancy data are missing. Useful for small cluster with mostly deal with single node event. X Drop this IOSHR DHR
  • 17. 1717 Live Patching A ----> B ----> C A B C B` After Patching B` is loaded by Linux's dynamic loader on the fly Sheep tracer Similar to Linux's Ftrace, virtually add constructor and destructor to every function. This mechanism relies on the 5 bytes space (A.K.A mcount) injected by GCC beforehand. Based on the tracer, we can replace any function in the sheep daemon on the fly. Useful for one-liner bug fixing but is limited on function level.
  • 18. 1818 NFS Server Current status: Just a toy with file size < 4M, NFSv3 is not fully supported and virtually no file system code (need implement inode, dentry and free space management) Todos - finish stubs - add extent to file allocation - add btree or hash based kv store to manage dentries - implement a multi-threaded SUNRPC to take place of poor performance glibc RPC - implement NFS v4
  • 19. 1919 Cinder - Block Storage – Support since day 1 Glance - Image Storage – Support merged at Havana version Nova - Ephemeral Storage – Not yet started Swift - Object Storage – Swift API compatible In progress Final Goal - Unified Storage – Copy-On-Write anywhere ? – Data dedup ? Sheep Sheep Sheep Sheep Cinder Glance Unified Storage NovaSwift Openstack Plan to rewrite the driver with libsheepdog.so