SlideShare a Scribd company logo
1 of 43
Download to read offline
THE STATE OF CEPH, MANILA,
AND CONTAINERS IN OPENSTACK
SAGE WEIL
OPENSTACK SUMMIT TOKYO - 2015.10.28
2
OUTLINE
● CephFS
● CephFS status update
● Current Manila landscape
● CephFS native driver
● Better FS plumbing to VMs
● Manila vs Nova responsibilities
● Manila and containers
● Summary
CEPHFS
4
WHY USE FILE IN THE CLOUD?
Why file?
● File-based applications aren't
going away
– POSIX is lingua-franca
● Interoperability with other
storage systems and data sets
● Container “volumes” are file
systems
– probably just directories
● Permissions and directories are
useful concepts
Why not block?
● Block is not useful for sharing data
between hosts
– ext4, XFS, etc assume exclusive
access
● Block devices are not very elastic
– File volumes can grow or shrink
without administrative resizing
5
WHY CEPH?
● All components scale horizontally
● No single point of failure
● Hardware agnostic, commodity hardware
● Self-manage whenever possible
● Open source (LGPL)
● Move beyond legacy approaches
– client/cluster instead of client/server
– avoid ad hoc approaches HA
6
CEPH COMPONENTS
RGW
A web services gateway
for object storage,
compatible with S3 and
Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-distributed
block device with cloud
platform integration
CEPHFS
A distributed file system
with POSIX semantics and
scale-out metadata
management
OBJECT BLOCK FILE
7
LINUX HOST
M M
M
RADOS CLUSTER
KERNEL MODULE
datametadata 01
10
CEPHFS DISTRIBUTED FILE SYSTEM
●
CephFS
– scalable data: files are stored
directly in RADOS
– scalable metadata: cluster of
metadata servers (MDS)
– POSIX: drop-in replacement for
any local or network file system
●
Multiple clients
– Linux kernel
– ceph-fuse
– libcephfs.so
●
Samba, Ganesha, Hadoop
MOSD Monitor MDS
8
CEPHFS DYNAMIC SUBTREE PARTITIONING
Root
Busy directory fragmented across many MDS’s
MDS 0
MDS 1
MDS 2
MDS 3
MDS 4
● Scalable
– Arbitrarily partition hierarchy
– 10s to 100s of MDSs
● Adaptive
– Move load from busy to idle servers
– Replicate hot metadata on multiple nodes
9
OTHER GOODIES
● Strongly consistent / coherent client caches
● Recursive accounting
– directory file size is amount of data stored
● Snapshots
– on any directory
● Directory quotas (libcephfs/ceph-fuse only)
– limit by bytes or file count
● xattrs
● ACLs
● Client-side persistent cache (Kernel client only)
CEPHFS STATUS UPDATE
11
ROAD TO PRODUCTION
● Focus on resilience
– handle errors gracefully
– detect and report issues
– provide recovery tools
● Achieve this first with a single-MDS configuration
● CephFS as dog food
– use CephFS internally to run our QA infrastructure
– have found (and fixed) several hard to reproduce client bugs
● “Production-ready” CephFS with Jewel release (Q1 2016)
12
WHAT IS NEW, WORK IN PROGRESS
● Improved health checks for diagnosing problems
– misbehaving clients, OSDs
● Diagnostic tools
– visibility into MDS request processing
– client session metadata (who is mounting what from where)
● Full space handling
● Client management
– evict misbehaving or dead clients
● Continuous verification
– online scrubbing of metadata
13
FSCK AND REPAIR
● Repair tools
– loss of data objects (which files are damaged)
– loss (or corruption) of metadata objects (which subtrees are damaged)
● cephfs-journal-tool
– disaster recovery for damaged MDS journal
– repair damaged journal
– recover metadata from damaged or partial journal
● cephfs-table-tool
– adjust/repair/reset session, inode, snap metadata
● cephfs-data-scan
– rebuild metadata (directory hierarchy) from data objects (disaster recovery)
14
ACCESS CONTROL
● Path-based
– restrict client mount to a subdirectory (e.g., /volumes/foo or /home/user)
– implemented in MDS
– integration with RADOS namespaces is WIP (targeting Jewel)
– (Jashan from GSoC)
● User-based
– mount file system with as single user (or small set of users)
– UID and GID based
– implement Unix-like permission checks at the MDS
– eventual integration with external auth/auth frameworks (e.g., Kerberos/AD)
– (Nishtha from Outreachy)
THE CURRENT MANILA LANDSCAPE
16
MANILA FILE STORAGE
● Manila manages file volumes (“shares”)
– create/delete, share/unshare
– file server network connectivity
– snapshot management
● Caveats, awkward bits
– Manila also manages (only) part of the connectivity problem
● somewhat limited view of options (network file protocols only)
● manages “share networks” via Neutron
– User has responsibility for the “last mile”
● user must attach guest to share network
● user must mount the share (mount -t …)
● mount mechanism varies with storage type and/or hypervisor (NFS or CIFS)
MANILA
17
?
APPLIANCE DRIVERS
● Appliance drivers
– tell an appliance to export NFS to guest IP
– map appliance IP into tenant network
(Neutron)
– boring (closed, proprietary, expensive, etc.)
● Status
– several drivers from usual suspects
– security punted to vendor
MANILA
VM
NFS CLIENT
NFS OVER NEUTRON NET
18
KVM
VM
GANESHA
GENERIC SHARE DRIVER
● Model
– Cinder volume attached to service VM
– local file system (XFS, ext4, btrfs, ...)
– Ganesha NFS server
– Neutron network shared with tenant
● Pros
– built from existing components
– tenant isolation, security
● Cons
– extra hop → higher latency
– service VM consumes resources
– service VM is SPoF
● Status
– reference driver
LIBRBD
VM
NFS OVER NEUTRON NET
NFS CLIENT
MANILACINDER
XFS
CEPH
NATIVE CEPH TO
STORAGE NETWORK
19
VM
GANESHA
GANESHA + LIBCEPHFS
● Model
– existing Ganesha driver toolkit, currently
used by GlusterFS
– Ganesha's libcephfs FSAL
● Pros
– simple, existing model
– security
● Cons
– extra hop → higher latency
– service VM is SpoF
– service VM consumes resources
● Status
– Manila Ganesha toolkit exists
– used for GlusterFS
– not yet integrated with CephFS
M M
RADOS CLUSTER
LIBCEPHFS
MANILA
NATIVE CEPH
VM
NFS CLIENT
NFS OVER NEUTRON NET
20
THE PROBLEM WITH SERVICE VMS
● Architecture is limited
– slow: extra hop
– expensive: extra VM
● Current implementation is not highly-available
– need service monitoring, failover
– possibly load balancing
– Manila code assumes a single service endpoint/proxy
● It's a big TODO list. Is it the right end point?
NATIVE CEPHFS MANILA DRIVER
22
CEPH NATIVE DRIVER
● Model
– allow tenant access to storage network
– mount CephFS directly from tenant VM
● Pros
– best performance
– access to full CephFS feature set
– simple
● Cons
– guest must have modern distro/kernel
– exposes tenant to Ceph cluster
– networking currently left to user
– must deliver mount secret to client
VM
M M
RADOS CLUSTER
CEPH.KO
MANILA
NATIVE CEPH
23
CEPHFS-VOLUME-MANAGER.PY
● cephfs-volume-manager.py → libcephfs.py → libcephfs.so → CephFS
– will be packaged as part of Ceph (with python-cephfs)
● Manila volumes/shares and consistency groups are just CephFS directories
– e.g., /manila/$cg/$volume
● Capture useful CephFS volume management tasks
– create – mkdir /manila/$cg/$volume
– delete (async) – mv /manila/$cg/$volume /manila/.trash
– snapshot volume – mkdir /manila/$cg/$volume/.snapshot/$snapname
– snapshot consistency group – mkdir /manila/$cg/.snapshot/$snapname
– promote snapshot to new volume
● read/write – cp -r ...
● read only – ln -s /manila/$cg/$vol/.snapshot/$snap /manila/$cg/$newvol
● Result is very simple Manila driver
– ~250 lines of code for native driver
24
SECURITY
● Tenant has access to the storage network
– Ceph and CephFS are responsible for security isolation between tenants
● Client authentication has been there for years
– modeled after Kerberos (mutual client/server authentication)
● New CephFS path-based authorization
– new in MDS. Now upstream
– missing CephFS support for rados namespaces
● needed to restrict client access to CephFS objects using librados API
– will be in Jewel (Q1 2016)
● Is that enough?
– Ceph's security is the only barrier
– DoS potential against cluster
– it depends on the environment...
BETTER FS PLUMBING
26
WE WANT
● Better security
– ...like we get with block storage
● Simplicity of configuration and deployment
– ...like with Qemu and librbd
● Good performance
– ...like with CephFS native
27
KVM + 9P/VIRTFS + LIBCEPHFS.SO
● Model
– link libcephfs.so into qemu virtfs layer
– guest OS (Linux) mounts via 9P-2000.L
● Pros
– security: tenant remains isolated from storage
net + locked inside a directory
– extremely simple deployment
● Cons
– requires (modern) Linux guests
– not supported on some distros
●
9p (kernel) and virtfs (qemu) code quality
– 9P isn't the great file protocol
● Status
– Prototype from Jevon Qiao, Haomai Wang, et al
● Qemu virtfs + libcephfs
● Manila driver + Nova mods
M M
RADOS CLUSTER
KVM
VIRTFS
NATIVE CEPH
VM
9P
LIBCEPHFS
28
KVM + NFS + NFSD/GANESHA + CEPHFS
● Model
– mount CephFS on host
● or Ganesha + libcephfs on host
– export NFS to guest over private net
● Pros
– security: tenant remains isolated from
storage net + locked inside a directory
– NFS is well supported everywhere
– reliable: same HW failure domain as guest
– works for any FS, not just CephFS
● Cons
– NFS has weak caching consistency
– protocol translation will slow us down some
– awkward and/or insecure networking...
HOST
M M
RADOS CLUSTER
KVM
NATIVE CEPH
CEPH.KO
VM
NFS
TCP/IP OVER
PRIVATE NET
29
NFS TO HOST: PROBLEMS WITH TCP/IP
● Slightly awkward networking
– add dedicated network device to VM
– configure local subnet and assign IPs on host and guest
– configure NFS export on hypervisor
– mount export from the VM
● Tricky to automate special-purpose network interfaces
● Guest networking infrastructure can disrupt file sharing
– firewalld
– networking restart
– “What is this weird network and interface doing here?”
● Other services on host may inadvertently be exposed to guest
– anything binding to INADDR_ANY (e.g., sshd)
30
AF_VSOCK
● VMware vSockets: a new(-ish) address family / socket type
– designed for communication between VMs and hosts
– stream-based or connectionless datagrams (just like IP)
– address is a simple integer (e.g., vsock:2)
– supported in Linux kernel since v3.9 (2013)
● Zero configuration simplicity
– hypervisor is always address vsock:1
– hypervisor assigns an address >1 to each VM
31
NFS TO HOST: VSOCK
● NFS v4.1 only
– older NFS versions have awkward legacy connectivity/addressing requirements (e.g.,
lockd)
– v4.1 consolidates protocol into a single connection
● Easy to support
– mostly boilerplate to add new address type, as with IPv6 (e.g., parsing)
● Linux kernel NFS client and server
– patches from Stefan Hajnoczi are under review
● Ganesha
– patches from Matt Benjamin are under review
● nfs-utils
– patches from Matt Benjamin are pending review
32
KVM + NFS (VSOCK) + NFSD + CEPHFS.KO
● Model
– mount CephFS on host, knfsd
● or Ganesha + libcephfs on host
– export to VM's VSOCK address
● Pros
– NFSv4.1 is well supported
– security is better...
– simpler configuration...
– more reliable...
● Cons
– VSOCK support for Qemu and NFS
is shiny and new
HOST
M M
RADOS CLUSTER
KVM
NATIVE CEPH
CEPH.KO + KNFSD
VM
NFS
AF_VSOCK
33
WE LIKE THE VSOCK-BASED MODEL
● Security
– tenant remains isolated from storage network
– no shared IP network between guest and host
● avoid INADDR_ANY problem (e.g., by sshd on host)
● Simplicity
– no network configuration beyond VM VSOCK address assignment (on host)
– treats VM as a black box
– no software-defined networking
● Reliability
– no gateway in a separate hardware failure domain
– fewer network traversals
● Performance
– clear win over a service VM
– possible win over TCP/IP to host (but not currently optimized with this in mind!)
34
VSOCK CHALLENGES
● New hotness
– need to get code upstream and intro supported distros/products
– Qemu, Linux kernel, Ganesha, nfs-utils
● Host configuration
– someone needs to assign VSOCK addresses to VMs
– someone needs to mount CephFS (or other FS) on the host and reexport NFS
to the guest
● User experience and the last mile
– How does a consumer of the Manila API know how to mount this thing?
– Do they need intimate knowledge of which Manila driver is in use, and what
attachment mechanism is supported by this particular OpenStack instance?
– Can they choose?
MANILA VS NOVA RESPONSIBILITIES
36
MANILA VS NOVA
● Manila manages shares/volumes
● Nova manages the VMs
● Cinder manages block volumes
● Nova manages VMs
● Nova attaches Cinder volumes to VMs
– mechanism is dependent on the Nova driver (KVM vs Xen vs lxd vs ...)
MANILA
NOVA
vs
37
NOVA: ATTACH/DETACH FS API
● Attach or detach a file system
– hypervisor mediates access to Manila shares/volumes
– networking?
● attach to Neutron network
● assign VSOCK address
– gateway/proxy?
● knfds or Ganesha
– containers?
● Fetch access metadata (e.g., mount command inputs)
– mount protocol and options depend on Nova instance type and share
type
● Now Nova...
– can reattach after reboot
– manage live migration
MANILA
NOVA
vs
38
NOVA: ATTACH/DETACH FS
FS access mode Meaning of attach/detach Meaning of access metadata
KVM, NFS from guest (e.g., to NetApp) Attach guest to Manila share's network Typical NFS mount command:
mount -t nfs $filervip:/ ...
KVM, VSOCK, Ganesha, libcephfs/libgfapi Write share definition to local Ganesha
config file for guest's VSOCK addr. Start
Ganesha.
NFS VSOCK mount command:
mount -t nfs vsock://1/ ...
KVM, VSOCK, knfsd, cephfs.ko mount Mount Cephfs. Write share definition to
/etc/exports for guest's VSOCK addr.
exportfs -a
NFS VSOCK mount command:
mount -t nfs vsock://1/ ...
KVM, NFS to generic share driver Attach guest to Manila share's network NFS IP mount command:
mount -t nfs $filerip:/ ...
KVM, NFS to Ganesha service VM Attach guest to Manila share's network NFS IP mount command:
mount -t nfs $filerip:/ ...
KVM or Ironic, native CephFS No-op
(or, attach guest to storage network)
CephFS mount requires a secret:
mount -t ceph $monip:/ -o secret=X ...
Nova container (lxc, lxd) Mount remote fs on host; mount –bind
share to guests /dev/manila/$shareid
Bind mount to desired location:
mount –bind /dev/manila/$shareid ...
WHAT ABOUT CONTAINERS
40
(LXC, LXD) + CEPHFS.KO
● Model
– host mounts CephFS (or whatever)
directly
– mount --bind share into container
namespace (/dev/manila/$shareid)
– user does mount --bind to final
location
● Pros
– best performance
– full CephFS semantics
● Cons
– rely on container for security
– need Nova attach/detach API
HOST
M M
RADOS CLUSTER
CONTAINER
MANILA
NATIVE CEPH
CEPH.KO
NOVA
42
SUMMARY
● Ceph native driver should land soon
– and Ceph Jewel (Q1 2016) will have production-ready CephFS!
● Current Manila models are appliance-centric or limited
● NFS over VSOCK to the host is promising
– simplicity, reliability, security, performance
– either kernel NFS server or Ganesha
● We need to sort out the Nova vs Manila interaction
– Nova APIs would help enable
● non-KVM users for Manila (containers, Ironic)
● NFS over VSOCK to a host gateway
THANK YOU!
Sage Weil
CEPH PRINCIPAL ARCHITECT
sage@redhat.com
@liewegas
44
FOR MORE INFORMATION
● Ceph
– http://ceph.com
– http://github.com/ceph
– http://tracker.ceph.com
● Mailing lists
– ceph-users@ceph.com
– ceph-devel@vger.kernel.org
● irc.oftc.net
– #ceph
– #ceph-devel
● Twitter
– @ceph
● Qemu + libcephfs, w/ Nova and Manila support
– https://github.com/JevonQ/qemu/commit/3c5d09149b5973
5905388ed51861c018c7737e7e
– https://github.com/yuyuyu101/nova/tree/bp/manila-virtfs-
support
– https://github.com/yuyuyu101/nova/tree/bp/manila-virtfs-
support
● Qemu virtio-vsock
– https://lwn.net/Articles/646365/
– https://github.com/stefanha/qemu/commits/vsock
● Linux NFS client/server VSOCK support
– https://github.com/stefanha/linux/commits/vsock-nfs
– https://copr.fedoraproject.org/coprs/jspray/vsock-nfs/builds/
● Ganesha VSOCK support
– https://github.com/linuxbox2/nfs-ganesha/tree/vsock
● Ceph native manila driver
– https://github.com/jcsp/manila/commits/ceph
● cephfs-volume-manager.py
– https://github.com/ceph/ceph/pull/6205

More Related Content

What's hot

Backup and Restore VMs Based on KVM
Backup and Restore VMs Based on KVMBackup and Restore VMs Based on KVM
Backup and Restore VMs Based on KVM
ShapeBlue
 
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
OpenStack Korea Community
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
ShapeBlue
 

What's hot (20)

[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
ASM
ASMASM
ASM
 
Edge Zones In CloudStack
Edge Zones In CloudStackEdge Zones In CloudStack
Edge Zones In CloudStack
 
Backup and Restore VMs Based on KVM
Backup and Restore VMs Based on KVMBackup and Restore VMs Based on KVM
Backup and Restore VMs Based on KVM
 
Delivering High-Availability Web Services with NGINX Plus on AWS
Delivering High-Availability Web Services with NGINX Plus on AWSDelivering High-Availability Web Services with NGINX Plus on AWS
Delivering High-Availability Web Services with NGINX Plus on AWS
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 
Ceph with CloudStack
Ceph with CloudStackCeph with CloudStack
Ceph with CloudStack
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
OVN - Basics and deep dive
OVN - Basics and deep diveOVN - Basics and deep dive
OVN - Basics and deep dive
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
 
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
 
OpenShift Container Platform 4.12 Release Notes
OpenShift Container Platform 4.12 Release NotesOpenShift Container Platform 4.12 Release Notes
OpenShift Container Platform 4.12 Release Notes
 
Calico and BGP
Calico and BGPCalico and BGP
Calico and BGP
 
RedHat OpenStack Platform Overview
RedHat OpenStack Platform OverviewRedHat OpenStack Platform Overview
RedHat OpenStack Platform Overview
 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutions
 
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
[OpenStack Days Korea 2016] Track3 - 오픈스택 환경에서 공유 파일 시스템 구현하기: 마닐라(Manila) 프로젝트
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 

Viewers also liked

Viewers also liked (6)

Scaling and High Performance Storage System: LeoFS
Scaling and High Performance Storage System: LeoFSScaling and High Performance Storage System: LeoFS
Scaling and High Performance Storage System: LeoFS
 
Rakuten LeoFs - distributed file system
Rakuten LeoFs - distributed file systemRakuten LeoFs - distributed file system
Rakuten LeoFs - distributed file system
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
 
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
하둡 알아보기(Learn about Hadoop basic), NetApp FAS NFS Connector for Hadoop
 
Deep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech Talks
Deep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech TalksDeep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech Talks
Deep Dive on the AWS Storage Gateway - April 2017 AWS Online Tech Talks
 
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
 

Similar to The State of Ceph, Manila, and Containers in OpenStack

Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
Ceph Community
 
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Hien Nguyen Van
 

Similar to The State of Ceph, Manila, and Containers in OpenStack (20)

Keeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersKeeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containers
 
Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development Ceph Day London 2014 - The current state of CephFS development
Ceph Day London 2014 - The current state of CephFS development
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
CephFS update February 2016
CephFS update February 2016CephFS update February 2016
CephFS update February 2016
 
Ceph Day New York 2014: Future of CephFS
Ceph Day New York 2014:  Future of CephFS Ceph Day New York 2014:  Future of CephFS
Ceph Day New York 2014: Future of CephFS
 
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILACEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA
CEPH DAY BERLIN - PRACTICAL CEPHFS AND NFS USING OPENSTACK MANILA
 
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
openATTIC Ceph Management @ OpenSuse Con - 2016-06-23
openATTIC Ceph Management @ OpenSuse Con - 2016-06-23openATTIC Ceph Management @ OpenSuse Con - 2016-06-23
openATTIC Ceph Management @ OpenSuse Con - 2016-06-23
 
[BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes]
[BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes][BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes]
[BarCamp2018][20180915][Tips for Virtual Hosting on Kubernetes]
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
 
CephFS Update
CephFS UpdateCephFS Update
CephFS Update
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
 
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
 
CloudStack In Production
CloudStack In ProductionCloudStack In Production
CloudStack In Production
 
Architecture of a Next-Generation Parallel File System
Architecture of a Next-Generation Parallel File System	Architecture of a Next-Generation Parallel File System
Architecture of a Next-Generation Parallel File System
 
Ceph and Storage Management with openATTIC, openSUSE Conference 2016-06-23
Ceph and Storage Management with openATTIC, openSUSE Conference 2016-06-23Ceph and Storage Management with openATTIC, openSUSE Conference 2016-06-23
Ceph and Storage Management with openATTIC, openSUSE Conference 2016-06-23
 
Big data analytics and docker the thrilla in manila
Big data analytics and docker  the thrilla in manilaBig data analytics and docker  the thrilla in manila
Big data analytics and docker the thrilla in manila
 
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
Ceph Day Santa Clara: The Future of CephFS + Developing with LibradosCeph Day Santa Clara: The Future of CephFS + Developing with Librados
Ceph Day Santa Clara: The Future of CephFS + Developing with Librados
 

More from Sage Weil

More from Sage Weil (11)

Ceph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud worldCeph data services in a multi- and hybrid cloud world
Ceph data services in a multi- and hybrid cloud world
 
Making distributed storage easy: usability in Ceph Luminous and beyond
Making distributed storage easy: usability in Ceph Luminous and beyondMaking distributed storage easy: usability in Ceph Luminous and beyond
Making distributed storage easy: usability in Ceph Luminous and beyond
 
What's new in Luminous and Beyond
What's new in Luminous and BeyondWhat's new in Luminous and Beyond
What's new in Luminous and Beyond
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud StorageCeph, Now and Later: Our Plan for Open Unified Cloud Storage
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
What's new in Jewel and Beyond
What's new in Jewel and BeyondWhat's new in Jewel and Beyond
What's new in Jewel and Beyond
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 

Recently uploaded

VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
nirzagarg
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
nirzagarg
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
nirzagarg
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
imonikaupta
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 

Recently uploaded (20)

VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 

The State of Ceph, Manila, and Containers in OpenStack

  • 1. THE STATE OF CEPH, MANILA, AND CONTAINERS IN OPENSTACK SAGE WEIL OPENSTACK SUMMIT TOKYO - 2015.10.28
  • 2. 2 OUTLINE ● CephFS ● CephFS status update ● Current Manila landscape ● CephFS native driver ● Better FS plumbing to VMs ● Manila vs Nova responsibilities ● Manila and containers ● Summary
  • 4. 4 WHY USE FILE IN THE CLOUD? Why file? ● File-based applications aren't going away – POSIX is lingua-franca ● Interoperability with other storage systems and data sets ● Container “volumes” are file systems – probably just directories ● Permissions and directories are useful concepts Why not block? ● Block is not useful for sharing data between hosts – ext4, XFS, etc assume exclusive access ● Block devices are not very elastic – File volumes can grow or shrink without administrative resizing
  • 5. 5 WHY CEPH? ● All components scale horizontally ● No single point of failure ● Hardware agnostic, commodity hardware ● Self-manage whenever possible ● Open source (LGPL) ● Move beyond legacy approaches – client/cluster instead of client/server – avoid ad hoc approaches HA
  • 6. 6 CEPH COMPONENTS RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully-distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale-out metadata management OBJECT BLOCK FILE
  • 7. 7 LINUX HOST M M M RADOS CLUSTER KERNEL MODULE datametadata 01 10 CEPHFS DISTRIBUTED FILE SYSTEM ● CephFS – scalable data: files are stored directly in RADOS – scalable metadata: cluster of metadata servers (MDS) – POSIX: drop-in replacement for any local or network file system ● Multiple clients – Linux kernel – ceph-fuse – libcephfs.so ● Samba, Ganesha, Hadoop MOSD Monitor MDS
  • 8. 8 CEPHFS DYNAMIC SUBTREE PARTITIONING Root Busy directory fragmented across many MDS’s MDS 0 MDS 1 MDS 2 MDS 3 MDS 4 ● Scalable – Arbitrarily partition hierarchy – 10s to 100s of MDSs ● Adaptive – Move load from busy to idle servers – Replicate hot metadata on multiple nodes
  • 9. 9 OTHER GOODIES ● Strongly consistent / coherent client caches ● Recursive accounting – directory file size is amount of data stored ● Snapshots – on any directory ● Directory quotas (libcephfs/ceph-fuse only) – limit by bytes or file count ● xattrs ● ACLs ● Client-side persistent cache (Kernel client only)
  • 11. 11 ROAD TO PRODUCTION ● Focus on resilience – handle errors gracefully – detect and report issues – provide recovery tools ● Achieve this first with a single-MDS configuration ● CephFS as dog food – use CephFS internally to run our QA infrastructure – have found (and fixed) several hard to reproduce client bugs ● “Production-ready” CephFS with Jewel release (Q1 2016)
  • 12. 12 WHAT IS NEW, WORK IN PROGRESS ● Improved health checks for diagnosing problems – misbehaving clients, OSDs ● Diagnostic tools – visibility into MDS request processing – client session metadata (who is mounting what from where) ● Full space handling ● Client management – evict misbehaving or dead clients ● Continuous verification – online scrubbing of metadata
  • 13. 13 FSCK AND REPAIR ● Repair tools – loss of data objects (which files are damaged) – loss (or corruption) of metadata objects (which subtrees are damaged) ● cephfs-journal-tool – disaster recovery for damaged MDS journal – repair damaged journal – recover metadata from damaged or partial journal ● cephfs-table-tool – adjust/repair/reset session, inode, snap metadata ● cephfs-data-scan – rebuild metadata (directory hierarchy) from data objects (disaster recovery)
  • 14. 14 ACCESS CONTROL ● Path-based – restrict client mount to a subdirectory (e.g., /volumes/foo or /home/user) – implemented in MDS – integration with RADOS namespaces is WIP (targeting Jewel) – (Jashan from GSoC) ● User-based – mount file system with as single user (or small set of users) – UID and GID based – implement Unix-like permission checks at the MDS – eventual integration with external auth/auth frameworks (e.g., Kerberos/AD) – (Nishtha from Outreachy)
  • 15. THE CURRENT MANILA LANDSCAPE
  • 16. 16 MANILA FILE STORAGE ● Manila manages file volumes (“shares”) – create/delete, share/unshare – file server network connectivity – snapshot management ● Caveats, awkward bits – Manila also manages (only) part of the connectivity problem ● somewhat limited view of options (network file protocols only) ● manages “share networks” via Neutron – User has responsibility for the “last mile” ● user must attach guest to share network ● user must mount the share (mount -t …) ● mount mechanism varies with storage type and/or hypervisor (NFS or CIFS) MANILA
  • 17. 17 ? APPLIANCE DRIVERS ● Appliance drivers – tell an appliance to export NFS to guest IP – map appliance IP into tenant network (Neutron) – boring (closed, proprietary, expensive, etc.) ● Status – several drivers from usual suspects – security punted to vendor MANILA VM NFS CLIENT NFS OVER NEUTRON NET
  • 18. 18 KVM VM GANESHA GENERIC SHARE DRIVER ● Model – Cinder volume attached to service VM – local file system (XFS, ext4, btrfs, ...) – Ganesha NFS server – Neutron network shared with tenant ● Pros – built from existing components – tenant isolation, security ● Cons – extra hop → higher latency – service VM consumes resources – service VM is SPoF ● Status – reference driver LIBRBD VM NFS OVER NEUTRON NET NFS CLIENT MANILACINDER XFS CEPH NATIVE CEPH TO STORAGE NETWORK
  • 19. 19 VM GANESHA GANESHA + LIBCEPHFS ● Model – existing Ganesha driver toolkit, currently used by GlusterFS – Ganesha's libcephfs FSAL ● Pros – simple, existing model – security ● Cons – extra hop → higher latency – service VM is SpoF – service VM consumes resources ● Status – Manila Ganesha toolkit exists – used for GlusterFS – not yet integrated with CephFS M M RADOS CLUSTER LIBCEPHFS MANILA NATIVE CEPH VM NFS CLIENT NFS OVER NEUTRON NET
  • 20. 20 THE PROBLEM WITH SERVICE VMS ● Architecture is limited – slow: extra hop – expensive: extra VM ● Current implementation is not highly-available – need service monitoring, failover – possibly load balancing – Manila code assumes a single service endpoint/proxy ● It's a big TODO list. Is it the right end point?
  • 22. 22 CEPH NATIVE DRIVER ● Model – allow tenant access to storage network – mount CephFS directly from tenant VM ● Pros – best performance – access to full CephFS feature set – simple ● Cons – guest must have modern distro/kernel – exposes tenant to Ceph cluster – networking currently left to user – must deliver mount secret to client VM M M RADOS CLUSTER CEPH.KO MANILA NATIVE CEPH
  • 23. 23 CEPHFS-VOLUME-MANAGER.PY ● cephfs-volume-manager.py → libcephfs.py → libcephfs.so → CephFS – will be packaged as part of Ceph (with python-cephfs) ● Manila volumes/shares and consistency groups are just CephFS directories – e.g., /manila/$cg/$volume ● Capture useful CephFS volume management tasks – create – mkdir /manila/$cg/$volume – delete (async) – mv /manila/$cg/$volume /manila/.trash – snapshot volume – mkdir /manila/$cg/$volume/.snapshot/$snapname – snapshot consistency group – mkdir /manila/$cg/.snapshot/$snapname – promote snapshot to new volume ● read/write – cp -r ... ● read only – ln -s /manila/$cg/$vol/.snapshot/$snap /manila/$cg/$newvol ● Result is very simple Manila driver – ~250 lines of code for native driver
  • 24. 24 SECURITY ● Tenant has access to the storage network – Ceph and CephFS are responsible for security isolation between tenants ● Client authentication has been there for years – modeled after Kerberos (mutual client/server authentication) ● New CephFS path-based authorization – new in MDS. Now upstream – missing CephFS support for rados namespaces ● needed to restrict client access to CephFS objects using librados API – will be in Jewel (Q1 2016) ● Is that enough? – Ceph's security is the only barrier – DoS potential against cluster – it depends on the environment...
  • 26. 26 WE WANT ● Better security – ...like we get with block storage ● Simplicity of configuration and deployment – ...like with Qemu and librbd ● Good performance – ...like with CephFS native
  • 27. 27 KVM + 9P/VIRTFS + LIBCEPHFS.SO ● Model – link libcephfs.so into qemu virtfs layer – guest OS (Linux) mounts via 9P-2000.L ● Pros – security: tenant remains isolated from storage net + locked inside a directory – extremely simple deployment ● Cons – requires (modern) Linux guests – not supported on some distros ● 9p (kernel) and virtfs (qemu) code quality – 9P isn't the great file protocol ● Status – Prototype from Jevon Qiao, Haomai Wang, et al ● Qemu virtfs + libcephfs ● Manila driver + Nova mods M M RADOS CLUSTER KVM VIRTFS NATIVE CEPH VM 9P LIBCEPHFS
  • 28. 28 KVM + NFS + NFSD/GANESHA + CEPHFS ● Model – mount CephFS on host ● or Ganesha + libcephfs on host – export NFS to guest over private net ● Pros – security: tenant remains isolated from storage net + locked inside a directory – NFS is well supported everywhere – reliable: same HW failure domain as guest – works for any FS, not just CephFS ● Cons – NFS has weak caching consistency – protocol translation will slow us down some – awkward and/or insecure networking... HOST M M RADOS CLUSTER KVM NATIVE CEPH CEPH.KO VM NFS TCP/IP OVER PRIVATE NET
  • 29. 29 NFS TO HOST: PROBLEMS WITH TCP/IP ● Slightly awkward networking – add dedicated network device to VM – configure local subnet and assign IPs on host and guest – configure NFS export on hypervisor – mount export from the VM ● Tricky to automate special-purpose network interfaces ● Guest networking infrastructure can disrupt file sharing – firewalld – networking restart – “What is this weird network and interface doing here?” ● Other services on host may inadvertently be exposed to guest – anything binding to INADDR_ANY (e.g., sshd)
  • 30. 30 AF_VSOCK ● VMware vSockets: a new(-ish) address family / socket type – designed for communication between VMs and hosts – stream-based or connectionless datagrams (just like IP) – address is a simple integer (e.g., vsock:2) – supported in Linux kernel since v3.9 (2013) ● Zero configuration simplicity – hypervisor is always address vsock:1 – hypervisor assigns an address >1 to each VM
  • 31. 31 NFS TO HOST: VSOCK ● NFS v4.1 only – older NFS versions have awkward legacy connectivity/addressing requirements (e.g., lockd) – v4.1 consolidates protocol into a single connection ● Easy to support – mostly boilerplate to add new address type, as with IPv6 (e.g., parsing) ● Linux kernel NFS client and server – patches from Stefan Hajnoczi are under review ● Ganesha – patches from Matt Benjamin are under review ● nfs-utils – patches from Matt Benjamin are pending review
  • 32. 32 KVM + NFS (VSOCK) + NFSD + CEPHFS.KO ● Model – mount CephFS on host, knfsd ● or Ganesha + libcephfs on host – export to VM's VSOCK address ● Pros – NFSv4.1 is well supported – security is better... – simpler configuration... – more reliable... ● Cons – VSOCK support for Qemu and NFS is shiny and new HOST M M RADOS CLUSTER KVM NATIVE CEPH CEPH.KO + KNFSD VM NFS AF_VSOCK
  • 33. 33 WE LIKE THE VSOCK-BASED MODEL ● Security – tenant remains isolated from storage network – no shared IP network between guest and host ● avoid INADDR_ANY problem (e.g., by sshd on host) ● Simplicity – no network configuration beyond VM VSOCK address assignment (on host) – treats VM as a black box – no software-defined networking ● Reliability – no gateway in a separate hardware failure domain – fewer network traversals ● Performance – clear win over a service VM – possible win over TCP/IP to host (but not currently optimized with this in mind!)
  • 34. 34 VSOCK CHALLENGES ● New hotness – need to get code upstream and intro supported distros/products – Qemu, Linux kernel, Ganesha, nfs-utils ● Host configuration – someone needs to assign VSOCK addresses to VMs – someone needs to mount CephFS (or other FS) on the host and reexport NFS to the guest ● User experience and the last mile – How does a consumer of the Manila API know how to mount this thing? – Do they need intimate knowledge of which Manila driver is in use, and what attachment mechanism is supported by this particular OpenStack instance? – Can they choose?
  • 35. MANILA VS NOVA RESPONSIBILITIES
  • 36. 36 MANILA VS NOVA ● Manila manages shares/volumes ● Nova manages the VMs ● Cinder manages block volumes ● Nova manages VMs ● Nova attaches Cinder volumes to VMs – mechanism is dependent on the Nova driver (KVM vs Xen vs lxd vs ...) MANILA NOVA vs
  • 37. 37 NOVA: ATTACH/DETACH FS API ● Attach or detach a file system – hypervisor mediates access to Manila shares/volumes – networking? ● attach to Neutron network ● assign VSOCK address – gateway/proxy? ● knfds or Ganesha – containers? ● Fetch access metadata (e.g., mount command inputs) – mount protocol and options depend on Nova instance type and share type ● Now Nova... – can reattach after reboot – manage live migration MANILA NOVA vs
  • 38. 38 NOVA: ATTACH/DETACH FS FS access mode Meaning of attach/detach Meaning of access metadata KVM, NFS from guest (e.g., to NetApp) Attach guest to Manila share's network Typical NFS mount command: mount -t nfs $filervip:/ ... KVM, VSOCK, Ganesha, libcephfs/libgfapi Write share definition to local Ganesha config file for guest's VSOCK addr. Start Ganesha. NFS VSOCK mount command: mount -t nfs vsock://1/ ... KVM, VSOCK, knfsd, cephfs.ko mount Mount Cephfs. Write share definition to /etc/exports for guest's VSOCK addr. exportfs -a NFS VSOCK mount command: mount -t nfs vsock://1/ ... KVM, NFS to generic share driver Attach guest to Manila share's network NFS IP mount command: mount -t nfs $filerip:/ ... KVM, NFS to Ganesha service VM Attach guest to Manila share's network NFS IP mount command: mount -t nfs $filerip:/ ... KVM or Ironic, native CephFS No-op (or, attach guest to storage network) CephFS mount requires a secret: mount -t ceph $monip:/ -o secret=X ... Nova container (lxc, lxd) Mount remote fs on host; mount –bind share to guests /dev/manila/$shareid Bind mount to desired location: mount –bind /dev/manila/$shareid ...
  • 40. 40 (LXC, LXD) + CEPHFS.KO ● Model – host mounts CephFS (or whatever) directly – mount --bind share into container namespace (/dev/manila/$shareid) – user does mount --bind to final location ● Pros – best performance – full CephFS semantics ● Cons – rely on container for security – need Nova attach/detach API HOST M M RADOS CLUSTER CONTAINER MANILA NATIVE CEPH CEPH.KO NOVA
  • 41. 42 SUMMARY ● Ceph native driver should land soon – and Ceph Jewel (Q1 2016) will have production-ready CephFS! ● Current Manila models are appliance-centric or limited ● NFS over VSOCK to the host is promising – simplicity, reliability, security, performance – either kernel NFS server or Ganesha ● We need to sort out the Nova vs Manila interaction – Nova APIs would help enable ● non-KVM users for Manila (containers, Ironic) ● NFS over VSOCK to a host gateway
  • 42. THANK YOU! Sage Weil CEPH PRINCIPAL ARCHITECT sage@redhat.com @liewegas
  • 43. 44 FOR MORE INFORMATION ● Ceph – http://ceph.com – http://github.com/ceph – http://tracker.ceph.com ● Mailing lists – ceph-users@ceph.com – ceph-devel@vger.kernel.org ● irc.oftc.net – #ceph – #ceph-devel ● Twitter – @ceph ● Qemu + libcephfs, w/ Nova and Manila support – https://github.com/JevonQ/qemu/commit/3c5d09149b5973 5905388ed51861c018c7737e7e – https://github.com/yuyuyu101/nova/tree/bp/manila-virtfs- support – https://github.com/yuyuyu101/nova/tree/bp/manila-virtfs- support ● Qemu virtio-vsock – https://lwn.net/Articles/646365/ – https://github.com/stefanha/qemu/commits/vsock ● Linux NFS client/server VSOCK support – https://github.com/stefanha/linux/commits/vsock-nfs – https://copr.fedoraproject.org/coprs/jspray/vsock-nfs/builds/ ● Ganesha VSOCK support – https://github.com/linuxbox2/nfs-ganesha/tree/vsock ● Ceph native manila driver – https://github.com/jcsp/manila/commits/ceph ● cephfs-volume-manager.py – https://github.com/ceph/ceph/pull/6205