SlideShare a Scribd company logo
1 of 47
Download to read offline
The Evolution of Storage
on Linux
Lenz Grimmer <lenz.grimmer@it-novum.com>
FrOSCON 2015, Sankt Augustin
22. August 2015
2
Agenda
 A trip down memory lane (pun intended)
 Overview of how storage on Linux has evolved
 Local file systems and related concepts/technologies
 Network Services
 Distributed / Cluster filesystems
3
Introduction
 40+ file systems in /fs/
 Focus on the most popular/widely used systems
 Primary focus on the software side
 High-level Descriptions only
4
Noteworthy Observations / Conclusions
 The role of SourceForge.net today
 Distribution kernels vs. mainline Linux
 Honorable mention: Christoph Hellwig
 Don‘t miss his talk about the Linux Storage Stack tomorrow (14:00, HS6)
 Big Thanks to: LWN, Kernelnewbies.org, Thorsten Leemhuis
(Heise) and Wikipedia
The early days
6
MINIX file system
 While developing Linux in 1991, Linus required some form of
persistent storage
 A Minix-compatible file system was the canonical choice:
 Well-documented, robust
 Exchange data with the host OS (and vice versa)
 Severely limited
 Max. file/filesystem size: 64MB (16bit block addresses)
 14 char file names
 Only one time stamp (mtime)
7
Virtual File System Switch (VFS)
 Abstraction / indirection layer to route file oriented system calls to
necessary functions in the physical filesystem code to do the I/O
 Eased the addition of new file systems
 Initially written by Chris Provenzano
 Integrated into Linux 0.96
 Defines a set of functions that every filesystem has to implement
 Three kinds of objects: filesystems, inodes, and open files
8
Extended File System (ext)
 Designed by Rémy Card
 Max. file/filesystem size: 2 GB, max. file name size was 255 chars
 Metadata structure inspired by the traditional Unix File System
(UFS)
 Added to Linux 0.96c in April 1992
 Issues remained (bad performance, missing time stamps,
fragmentation)
9
Second Extended File System (ext2)
 Also implemented by Rémy Card
 Introduced in Linux Kernel 0.99 (January 1993)
 Designed with extensibility in mind
 Adopted advanced ideas from other file systems (e.g. BSD Fast File System),
e.g. mtime/ctime/atime, file attributes, BSD/SysV semantics, different block
sizes, immutable/append-only files
 Initially supported file/file systems sizes up to 2TB (limitation of the block
device layer)
 Kernel version 2.6.17 (March 2006) extended max. file system size to 32TB
(using 8kB Blocks)
10
FAT/MSDOS
 Added to Linux in 1992/1993 by Werner Almesberger
 VFAT support was later developed by Gordon Chaffee
 VFAT filesystem is compatible with Windows 95/NT long filenames on the
FAT filesystem
 Initially called xmsdos
 Patches for Linux 1.2.x and 1.3.x.
 As of Linux 1.3.60, the vfat filesystem is part of the Linux kernel distribution
 Mtools as a userland-only alternative
11
NTFS
 NTFS driver for Linux by Martin von Löwis (started around 1996)
 Legato Systems later sponsored Anton Altaparmakov to further
develop NTFS on Linux since June 2001
 Read-only mode only, with no fault-tolerance supported
 NFTS-TNG replaced old NTFS driver in Linux 2.5.11 (April 29th,
2002)
 NTFS-3G (FUSE-based) by Tuxera (read-write support)
The Age of Journaling
Filesystems
13
Fsck vs. Journaling
 Unclean unmounts, too many mount counts, or remounts after
a long time period triggered file system checks
 Disk drives got bigger
 A Journaling file system keeps track of changes not yet
committed to the file system's main part in a Journal
 Keep track of just metadata changes or data as well
 Several file systems were developed in parallel, to alleviate this
shortcoming of ext2, namely ext3, XFS, JFS and ReiserFS.
14
Journaling Block Device layer (JBD)
 JBD established as a filesystem-independent service, to be used
by any file system
 First incarnation of JBD developed by Stephen C. Tweedie
together with the ext3 file system
 OCFS2 and later ext4 also used JBD and it’s successor JBD2
15
Third extended filesystem (ext3)
 Originally released in September 1999
 Written by Stephen Tweedie for the 2.2 branch
 Ported to 2.4 kernels by Peter Braam, Andreas Dilger, Andrew
Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie
 Merged with the mainline Linux kernel 2.4.15 (November 2001)
 Basically ext2 with journaling capabilities, easy conversion
 Max filesystem size: 8TB, Max 32k subdirs/directory
16
IBM JFS
 Rooted in AIX and OS/2 Warp Server (new design in 1995)
 Port to Linux started in December 1999 (Dave Kleikamp, Steve Best)
 Uses own journaling implementation (metadata only)
 Max volume size: 32PB, Max file size: 4PB
 Later ported to AIX 5L as JFS2 (April 2001)
 JFS 0.0.1 released in Feb. 2000., 0.1.0 (Beta) in August 2000
 Version 1.0.0 was released in June 2001
 Kernel module since 2.4.18pre9-ac4, Version 1.1.0 was included by Marcelo
Tosatti in Linux 2.4.20.
17
ReiserFS
 Early supported by SuSE, Introduced in version 2.4.1 (2001)
 The first journaling file system to be included in mainline
 Max volume size: 16TB
 Based on B+ trees
 Metadata-only journaling (block journaling since 2.6.8)
 Online resizing
 Tail packing block suballocation
 Reiser4 still under active development (Edward Shishkin)
18
SGI XFS
 64-bit journaling file system created by Silicon Graphics
 SGI IRIX since 1994, GPLed in 2000
 Version 1.0 for Linux in May 2001 as Patch against 2.4.2
 Merged in 2.6.x and 2.4.25 (Feb 2004)
 Steve Lord, Russell Cattelan, Nathan Scott, Jim Mostek
 Advanced features, high performance
 Max volume size: 16EB
Volume Management
20
The need for Logical Volume Management
 Initially, Linux could only address disks/partitions
 Changes to the layout required downtime and shuffling of data
 Logical Volume Management abstracts physical disk drives
 First incarnation of Linux LVM was introduced in Kernel version
2.4
 Heinz Mauelshagen wrote the original LVM code in 1998,
inspired by HP-UX's volume manager.
21
Device Mapper (DM)
 A kernel framework for mapping physical block devices onto higher-
level virtual block devices
 Added in Linux 2.6
 Passes data from a virtual block device, which is provided by the
device mapper itself, to another block device
 Pluggable design
 Data can be also modified in transition
 Forms the foundation of LVM2/EVMS, RAID and dm-crypt disk
encryption and many other useful features
22
DM Multipath (DM-MPIO)
 Consists of kernel components and user-space components
 Provides input-output (I/O) fail-over and load-balancing within Linux
for block devices
 Handles the rerouting of block I/O to an alternate path in the event of
a path failure
 Can also balance the I/O load across all of the available paths in Fibre
Channel (FC) or iSCSI SAN environments
 Started as part of a patchset created by Joe Thornber, later
maintained by Alasdair G Kergon at Red Hat. Christophe Varoqui
maintains the userland multipath tools
23
DM-Cache
 Allows a fast device (e.g. an SSD) to be used as a cache for a slower device
(e.g. a rotating disk)
 Different policy plugins can be used to change the algorithms used to select
which blocks are promoted, demoted, cleaned etc.
 Supports writeback and writethrough modes
 Requires three physical storage devices to separately store actual data,
cache data and required metadata
 Joe Thornber, Heinz Mauelshagen and Mike Snitzer
 Inclusion into the Linux mainline kernel version 3.9, released on April 28,
2013
24
LVM2
 Based on DM
 Flexible storage management
 Add/remove disks
 Resize/move logical volumes
 Move LVs between PVs
 Span volumes across multiple physical devices
 RAID
 Thin provisioning
 Cluster Volume Manager
25
IBM EVMS
 IBM-sponsored effort to provide volume management services for
Linux
 A single, unified system for handling all storage management tasks
 Despite many of the features and GUI management tools found in
EVMS, LVM2 was preferred
 As a result, IBM dropped their kernel driver and reworked their tools
to work with LVM2 instead
 Development stopped in 2006
Storage Services
27
NFS
 Rick Sladkey original author of the NFS client and also ported the NFS server
and the RPC library code. Doug Quale helped extending the kernel to
support networking filesystems
 NFS Version 2 since 1.2 kernel series
 Kernel 2.2.18 a major milestone: mixing Linux NFS with other operating
systems' NFS, use file locking reliably over NFS, and NFS Version 3.
 NFS Versions 2, 3, and 4 are supported on 2.6 and later kernels. Version 4.1
(Client) at least kernel 2.6.31
 NFSv4 for Linux has been under development at CITI and NetApp since 2001
28
Samba
 A free-software re-implementation of the SMB/CIFS networking protocol
 Andrew Tridgell started development of Samba in 1992, Jeremy Allison
joined early on
 Volker Lendecke founded SerNet in 1997, to provide commercial support
 Version 3 (2003): file and print services for Microsoft Windows clients and can
integrate with a Windows NT 4.0 server domain, either as a Primary Domain
Controller (PDC) or as a domain member
 Samba4 installations can act as an Active Directory domain controller or
member server, at Windows 2008 domain and forest functional levels.
29
SMB vs.CIFS
 SMB "server message block" and CIFS "common internet file system"
are protocols. CIFS is the extension of the SMB protocol
 “smbfs” was an older FS originated from the Samba project, heavily
coupled with the Samba tools (smb.conf, smbmount, etc.). Removed
in Linux 2.6.27
 CIFS VFS was added to mainline Linux kernels in 2.5.42 Supports
advanced network file system features such as locking, Unicode
(advanced internationalization), hardlinks, dfs (hierarchical,
replicated name space), distributed caching and uses native TCP
names. All key network functions implemented in kernel
Current Filesystems
31
Fourth Extended Filesystem (ext4)
 Advanced version of ext3, led by Ted Tso et al
 Incorporated scalability and reliability enhancements for supporting
large filesystems up to 1EB.
 First experimental support for ext4 was merged into Linux 2.6.19,
which was released on 29 November 2006.
 Ext4 was marked as experimental until Linux 2.6.27
 Starting with 2.6.28 (December 2008), ext4 was marked as stable
 New extent format reduced metadata overhead (RAM, IO for access,
transactions)
32
Btrfs
 Chris Mason (Oracle) in 2007
 COW (Snapshots)
 Checksums, Compression
 RAID, Volume management
 Conversion of ext3/4 file systems
 Merged into mainline Linux 2.6.29 (March 2009)
 Florian Winkler talks about Btrfs today (11:15, HS7)
33
ZFS
 Filesystem and logical volume manager combined
 Designed and implemented at Sun Microsystems (Jeff Bonwick, Matthew
Ahrens)
 Development started in 2001,officially announced in 2004
 128bit, COW, Snapshots, Deduplication, RAID
 OpenSolaris (CDDL)
 Early port based on FUSE
 Kernel modules based OpenZFS (2013)
 Not included in mainline Linux due to license incompatibilities
Network Storage
35
Network Block Device (NBD)
 Remotely access a block device attached to another system
 Userspace Server/Client, Client kernel module
 Issues arise if network goes down or server crashes
 Markus Pargmann talks about NBD on Sunday (16:30, HS6)
36
Distributed Replicated Block Device (DRBD)
 A shared-nothing, synchronously replicated block device
 “RAID1 over Network”
 Writes to the primary node are transferred to the lower-level block device and
simultaneously propagated to the secondary node
 The secondary node then transfers data to its corresponding lower-level block
device. All read I/O is performed locally
 Fail-over capabilities (Secondary/Primary)
 Lars Ellenberg and Philipp Reisner originally submitted code in July 2007
 DRBD was merged on 8 December 2009 during the "merge window" for Linux
kernel version 2.6.33
Cluster Filesystems
38
OCFS/OCFS2
 Shared disk file system by Oracle
 Main focus of OCFS was to accommodate Oracle clustered databases,
not POSIX-compliant
 OCFS2 designed as a Linux filesystem from scratch
 On-disk filesystem implementation heavily inspired by ext3, uses JBD
for journaling
 OCFS2 integrated into version 2.6.16 of mainline Linux
 Max Volume/File Size 4PB (currently limited to 16TB)
 Trivia question: what feature do OCFS2 and Btrfs have in common?
39
GFS/GFS2
 Shared disk filesystem, allows concurrent access to the same block storage
 Development of GFS began in 1995 and was originally developed by
University of Minnesota professor Matthew O'Keefe and a group of students
 Originally for SGI IRIX, ported to Linux in 1998
 Acquired by Sistina in 2000, turned into proprietary product
 OpenGFS fork
 Red Hat acquired Sistina in 2003 and released GFS2 under GPL in June 2004
 GFS2 and the DLM merged into Linux 2.6.19 (29 November 2006)
40
Storage Requirements and Challenges
 Amount of data to be stored grows exponentially
 Today, Storage has to be:
 Fault tolerant, reliable
 Scalable without limitations or service interruptions
 Distributable
 Easy to manage / automate
 Previous approaches do not address these requirements
Distributed Filesystems
42
GlusterFS
 Aggregates various storage servers over Ethernet or Infiniband RDMA
interconnect into one large parallel network file system
 Storage bricks export local file systems as volumes
 GlusterFS clients create composite virtual volumes from multiple remote
servers using stackable „translators“
 Translators provide Mirroring, Replication, Striping, etc.
 Final volume mounted by client host using its own native protocol via FUSE,
using NFS v3 protocol (via built-in server translator)
 Originally developed by Gluster, Inc., which was acquired by Red Hat in 2011
43
Ceph
 Initially created by Sage Weil, founded Inktank in 2012
 First release in July 2012
 Object, block, and file storage from a single distributed computer cluster
 Reliable autonomic distributed object store (RADOS)
 RADOS Block Device (RBD), Snapshots
 RadosGW provides REST API (Amazon S3/OpenStack Swift)
 Completely distributed without a single point of failure
 Replicates data for fault tolerance (CRUSH)
 Ceph client code was merged into mainling Linux version 2.6.34
 Red Hat acquired Inktank in April 2014
44
Lustre
 Parallel distributed file system, generally used for large-scale cluster computing
 Widely used in TOP500 supercomputers
 Max. volume size: 100 PB (production), over 16 EB (theoretical)
 Max. file size: 2.5 PB (ext4), 16 EB (ZFS)
 Started as a research project in 1999 by Peter Braam at CMU, who founded Cluster Filesystems Inc. in
2001 to work on Intermezzo, Coda and Lustre
 First installed in March 2003 on the MCR Linux Cluster (Lawrence Livermore National Laboratory).
Lustre 1.0.0 was released in December 2003.
 Acquired by Sun Microsystems in 2007
 Oracle acquired Sun in 2010 and discontinued the development
 Whamcloud->Intel, OpenScalabaleFilesystems Inc. (OpenSFS), Xyratex Inc.
45
Shameless plug: openATTIC
 Unified Storage: manage XFS, ZFS, Btrfs, NFS, Samba
 Modern GUI (AngularJS/Boostrap)
 REST API
 Built-in Monitoring
 Clustering (Pacemaker/Corosync, DRBD)
 http://www.openattic.org/
 Find us in the exhibition hall
46
PHP-ENTWICKLER (M/W) mit
Linux Know-how
Sie entwickeln leidenschaftlich gerne und fühlen sich im
Open Source-Umfeld Zuhause?
Dann sollten wir uns kennenlernen!
Diese Aufgaben erwarten Sie bei uns…
• Entwicklung unseres Systemmonitoring-Tools
openITCOCKPIT für Frontend und/oder Backend
• Konzeption und Realisierung von Projekten in
Teamarbeit
• Testing der entwickelten Anwendungen
• Pflege und Ausbau der bestehenden Entwicklungs- und
Testumgebung
Weitere Informationen finden Sie unter:
www.it-novum.com/karriere
Gesucht: PHP-Entwickler (m/w) mit Linux Know-How
Thank you!

More Related Content

What's hot

Unix and shell programming | Unix File System | Unix File Permission | Blocks
Unix and shell programming | Unix File System | Unix File Permission | BlocksUnix and shell programming | Unix File System | Unix File Permission | Blocks
Unix and shell programming | Unix File System | Unix File Permission | BlocksLOKESH KUMAR
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architectureSHAJANA BASHEER
 
Linux kernel Architecture and Properties
Linux kernel Architecture and PropertiesLinux kernel Architecture and Properties
Linux kernel Architecture and PropertiesSaadi Rahman
 
Mca ii os u-5 unix linux file system
Mca  ii  os u-5 unix linux file systemMca  ii  os u-5 unix linux file system
Mca ii os u-5 unix linux file systemRai University
 
Red Hat System Administration
Red Hat System AdministrationRed Hat System Administration
Red Hat System AdministrationRafi Rahimov
 
Linux fundamentals Training
Linux fundamentals TrainingLinux fundamentals Training
Linux fundamentals TrainingLove Steven
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernelguest547d74
 
A beginners introduction to unix
A beginners introduction to unixA beginners introduction to unix
A beginners introduction to unixzafarali1981
 
Unix operating system
Unix operating systemUnix operating system
Unix operating systemABhay Panchal
 

What's hot (19)

Unix and shell programming | Unix File System | Unix File Permission | Blocks
Unix and shell programming | Unix File System | Unix File Permission | BlocksUnix and shell programming | Unix File System | Unix File Permission | Blocks
Unix and shell programming | Unix File System | Unix File Permission | Blocks
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
Unix Introduction
Unix IntroductionUnix Introduction
Unix Introduction
 
Ubuntu OS Presentation
Ubuntu OS PresentationUbuntu OS Presentation
Ubuntu OS Presentation
 
Linux kernel Architecture and Properties
Linux kernel Architecture and PropertiesLinux kernel Architecture and Properties
Linux kernel Architecture and Properties
 
Introduction to Unix
Introduction to UnixIntroduction to Unix
Introduction to Unix
 
Mca ii os u-5 unix linux file system
Mca  ii  os u-5 unix linux file systemMca  ii  os u-5 unix linux file system
Mca ii os u-5 unix linux file system
 
Red Hat System Administration
Red Hat System AdministrationRed Hat System Administration
Red Hat System Administration
 
Ubuntu
UbuntuUbuntu
Ubuntu
 
Linux introduction
Linux introductionLinux introduction
Linux introduction
 
Linux fundamentals Training
Linux fundamentals TrainingLinux fundamentals Training
Linux fundamentals Training
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernel
 
OSCh20
OSCh20OSCh20
OSCh20
 
A beginners introduction to unix
A beginners introduction to unixA beginners introduction to unix
A beginners introduction to unix
 
Linux lecture5
Linux lecture5Linux lecture5
Linux lecture5
 
Case study windows
Case study windowsCase study windows
Case study windows
 
Introduction to unix
Introduction to unixIntroduction to unix
Introduction to unix
 
Unix operating system
Unix operating systemUnix operating system
Unix operating system
 
Ubuntu File System
Ubuntu File SystemUbuntu File System
Ubuntu File System
 

Similar to The evolution of storage on Linux

Similar to The evolution of storage on Linux (20)

L2(1).PPT
L2(1).PPTL2(1).PPT
L2(1).PPT
 
Assignment On Linux Unix Life Cycle And Its Commands Course Title System Pro...
Assignment On Linux Unix Life Cycle And Its Commands Course Title  System Pro...Assignment On Linux Unix Life Cycle And Its Commands Course Title  System Pro...
Assignment On Linux Unix Life Cycle And Its Commands Course Title System Pro...
 
Os
OsOs
Os
 
Operating System
Operating SystemOperating System
Operating System
 
Presentation on linux
Presentation on linuxPresentation on linux
Presentation on linux
 
5231 140-hellwig
5231 140-hellwig5231 140-hellwig
5231 140-hellwig
 
Linux kernel
Linux kernelLinux kernel
Linux kernel
 
Studies
StudiesStudies
Studies
 
Ospresentation 120112074429-phpapp02 (1)
Ospresentation 120112074429-phpapp02 (1)Ospresentation 120112074429-phpapp02 (1)
Ospresentation 120112074429-phpapp02 (1)
 
OSOS SEM 4 Chapter 2 part 1
OSOS SEM 4 Chapter 2 part 1OSOS SEM 4 Chapter 2 part 1
OSOS SEM 4 Chapter 2 part 1
 
introduction.pdf
introduction.pdfintroduction.pdf
introduction.pdf
 
ubantu ppt.pptx
ubantu ppt.pptxubantu ppt.pptx
ubantu ppt.pptx
 
CS8493-OS-Unit-5.pdf
CS8493-OS-Unit-5.pdfCS8493-OS-Unit-5.pdf
CS8493-OS-Unit-5.pdf
 
Cs8493 unit 5
Cs8493 unit 5Cs8493 unit 5
Cs8493 unit 5
 
Linux technology
Linux technologyLinux technology
Linux technology
 
UNIX Operating System ppt
UNIX Operating System pptUNIX Operating System ppt
UNIX Operating System ppt
 
Linux OS presentation
Linux OS presentationLinux OS presentation
Linux OS presentation
 
OS(ch16)-LinuxSystem.pptx
OS(ch16)-LinuxSystem.pptxOS(ch16)-LinuxSystem.pptx
OS(ch16)-LinuxSystem.pptx
 
Ch22
Ch22Ch22
Ch22
 
OS_Ch20
OS_Ch20OS_Ch20
OS_Ch20
 

More from it-novum

openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07
openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07
openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07it-novum
 
openATTIC Ceph Management @ OpenSuse Con - 2016-06-23
openATTIC Ceph Management @ OpenSuse Con - 2016-06-23openATTIC Ceph Management @ OpenSuse Con - 2016-06-23
openATTIC Ceph Management @ OpenSuse Con - 2016-06-23it-novum
 
openATTIC Ceph Management @ Ceph Tech Talks - 2016-06-23
openATTIC Ceph Management @ Ceph Tech Talks - 2016-06-23openATTIC Ceph Management @ Ceph Tech Talks - 2016-06-23
openATTIC Ceph Management @ Ceph Tech Talks - 2016-06-23it-novum
 
openATTIC Technology Overview - Ceph Management
openATTIC Technology Overview - Ceph ManagementopenATTIC Technology Overview - Ceph Management
openATTIC Technology Overview - Ceph Managementit-novum
 
Tweets und Aktienkurse? Wertvolle Erkenntnisse durch Data Blending gewinnen
Tweets und Aktienkurse? Wertvolle Erkenntnisse durch Data Blending gewinnenTweets und Aktienkurse? Wertvolle Erkenntnisse durch Data Blending gewinnen
Tweets und Aktienkurse? Wertvolle Erkenntnisse durch Data Blending gewinnenit-novum
 
Flexible storage management with Linux and openATTIC
Flexible storage management with Linux and openATTICFlexible storage management with Linux and openATTIC
Flexible storage management with Linux and openATTICit-novum
 
Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015
Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015
Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015it-novum
 
Open Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit ParisOpen Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit Parisit-novum
 
OpenStack Day Italy: openATTC as an open storage platform for OpenStack
OpenStack Day Italy: openATTC as an open storage platform for OpenStackOpenStack Day Italy: openATTC as an open storage platform for OpenStack
OpenStack Day Italy: openATTC as an open storage platform for OpenStackit-novum
 
Building an open source cloud storage platform for OpenStack - openATTIC
Building an open source cloud storage platform for OpenStack - openATTICBuilding an open source cloud storage platform for OpenStack - openATTIC
Building an open source cloud storage platform for OpenStack - openATTICit-novum
 
130213 itn webcast_sap
130213 itn webcast_sap130213 itn webcast_sap
130213 itn webcast_sapit-novum
 

More from it-novum (11)

openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07
openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07
openATTIC & Ceph Management @ Suse Monthly Open Source Talks - 2016-06-07
 
openATTIC Ceph Management @ OpenSuse Con - 2016-06-23
openATTIC Ceph Management @ OpenSuse Con - 2016-06-23openATTIC Ceph Management @ OpenSuse Con - 2016-06-23
openATTIC Ceph Management @ OpenSuse Con - 2016-06-23
 
openATTIC Ceph Management @ Ceph Tech Talks - 2016-06-23
openATTIC Ceph Management @ Ceph Tech Talks - 2016-06-23openATTIC Ceph Management @ Ceph Tech Talks - 2016-06-23
openATTIC Ceph Management @ Ceph Tech Talks - 2016-06-23
 
openATTIC Technology Overview - Ceph Management
openATTIC Technology Overview - Ceph ManagementopenATTIC Technology Overview - Ceph Management
openATTIC Technology Overview - Ceph Management
 
Tweets und Aktienkurse? Wertvolle Erkenntnisse durch Data Blending gewinnen
Tweets und Aktienkurse? Wertvolle Erkenntnisse durch Data Blending gewinnenTweets und Aktienkurse? Wertvolle Erkenntnisse durch Data Blending gewinnen
Tweets und Aktienkurse? Wertvolle Erkenntnisse durch Data Blending gewinnen
 
Flexible storage management with Linux and openATTIC
Flexible storage management with Linux and openATTICFlexible storage management with Linux and openATTIC
Flexible storage management with Linux and openATTIC
 
Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015
Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015
Closing the Storage gap - presentation from OpenStack Summit in Vancouver 2015
 
Open Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit ParisOpen Cloud Storage @ OpenStack Summit Paris
Open Cloud Storage @ OpenStack Summit Paris
 
OpenStack Day Italy: openATTC as an open storage platform for OpenStack
OpenStack Day Italy: openATTC as an open storage platform for OpenStackOpenStack Day Italy: openATTC as an open storage platform for OpenStack
OpenStack Day Italy: openATTC as an open storage platform for OpenStack
 
Building an open source cloud storage platform for OpenStack - openATTIC
Building an open source cloud storage platform for OpenStack - openATTICBuilding an open source cloud storage platform for OpenStack - openATTIC
Building an open source cloud storage platform for OpenStack - openATTIC
 
130213 itn webcast_sap
130213 itn webcast_sap130213 itn webcast_sap
130213 itn webcast_sap
 

Recently uploaded

Communication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxCommunication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxkb31670
 
Juan Pablo Sugiura - eCommerce Day Bolivia 2024
Juan Pablo Sugiura - eCommerce Day Bolivia 2024Juan Pablo Sugiura - eCommerce Day Bolivia 2024
Juan Pablo Sugiura - eCommerce Day Bolivia 2024eCommerce Institute
 
The Real Story Of Project Manager/Scrum Master From Where It Came?!
The Real Story Of Project Manager/Scrum Master From Where It Came?!The Real Story Of Project Manager/Scrum Master From Where It Came?!
The Real Story Of Project Manager/Scrum Master From Where It Came?!Loay Mohamed Ibrahim Aly
 
Dynamics of Professional Presentationpdf
Dynamics of Professional PresentationpdfDynamics of Professional Presentationpdf
Dynamics of Professional Presentationpdfravleel42
 
Communication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxCommunication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxkb31670
 
Burning Issue presentation of Zhazgul N. , Cycle 54
Burning Issue presentation of Zhazgul N. , Cycle 54Burning Issue presentation of Zhazgul N. , Cycle 54
Burning Issue presentation of Zhazgul N. , Cycle 54ZhazgulNurdinova
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8Access Innovations, Inc.
 
Machine learning workshop, CZU Prague 2024
Machine learning workshop, CZU Prague 2024Machine learning workshop, CZU Prague 2024
Machine learning workshop, CZU Prague 2024Gokulks007
 

Recently uploaded (8)

Communication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxCommunication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptx
 
Juan Pablo Sugiura - eCommerce Day Bolivia 2024
Juan Pablo Sugiura - eCommerce Day Bolivia 2024Juan Pablo Sugiura - eCommerce Day Bolivia 2024
Juan Pablo Sugiura - eCommerce Day Bolivia 2024
 
The Real Story Of Project Manager/Scrum Master From Where It Came?!
The Real Story Of Project Manager/Scrum Master From Where It Came?!The Real Story Of Project Manager/Scrum Master From Where It Came?!
The Real Story Of Project Manager/Scrum Master From Where It Came?!
 
Dynamics of Professional Presentationpdf
Dynamics of Professional PresentationpdfDynamics of Professional Presentationpdf
Dynamics of Professional Presentationpdf
 
Communication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptxCommunication Accommodation Theory Kaylyn Benton.pptx
Communication Accommodation Theory Kaylyn Benton.pptx
 
Burning Issue presentation of Zhazgul N. , Cycle 54
Burning Issue presentation of Zhazgul N. , Cycle 54Burning Issue presentation of Zhazgul N. , Cycle 54
Burning Issue presentation of Zhazgul N. , Cycle 54
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Machine learning workshop, CZU Prague 2024
Machine learning workshop, CZU Prague 2024Machine learning workshop, CZU Prague 2024
Machine learning workshop, CZU Prague 2024
 

The evolution of storage on Linux

  • 1. The Evolution of Storage on Linux Lenz Grimmer <lenz.grimmer@it-novum.com> FrOSCON 2015, Sankt Augustin 22. August 2015
  • 2. 2 Agenda  A trip down memory lane (pun intended)  Overview of how storage on Linux has evolved  Local file systems and related concepts/technologies  Network Services  Distributed / Cluster filesystems
  • 3. 3 Introduction  40+ file systems in /fs/  Focus on the most popular/widely used systems  Primary focus on the software side  High-level Descriptions only
  • 4. 4 Noteworthy Observations / Conclusions  The role of SourceForge.net today  Distribution kernels vs. mainline Linux  Honorable mention: Christoph Hellwig  Don‘t miss his talk about the Linux Storage Stack tomorrow (14:00, HS6)  Big Thanks to: LWN, Kernelnewbies.org, Thorsten Leemhuis (Heise) and Wikipedia
  • 6. 6 MINIX file system  While developing Linux in 1991, Linus required some form of persistent storage  A Minix-compatible file system was the canonical choice:  Well-documented, robust  Exchange data with the host OS (and vice versa)  Severely limited  Max. file/filesystem size: 64MB (16bit block addresses)  14 char file names  Only one time stamp (mtime)
  • 7. 7 Virtual File System Switch (VFS)  Abstraction / indirection layer to route file oriented system calls to necessary functions in the physical filesystem code to do the I/O  Eased the addition of new file systems  Initially written by Chris Provenzano  Integrated into Linux 0.96  Defines a set of functions that every filesystem has to implement  Three kinds of objects: filesystems, inodes, and open files
  • 8. 8 Extended File System (ext)  Designed by Rémy Card  Max. file/filesystem size: 2 GB, max. file name size was 255 chars  Metadata structure inspired by the traditional Unix File System (UFS)  Added to Linux 0.96c in April 1992  Issues remained (bad performance, missing time stamps, fragmentation)
  • 9. 9 Second Extended File System (ext2)  Also implemented by Rémy Card  Introduced in Linux Kernel 0.99 (January 1993)  Designed with extensibility in mind  Adopted advanced ideas from other file systems (e.g. BSD Fast File System), e.g. mtime/ctime/atime, file attributes, BSD/SysV semantics, different block sizes, immutable/append-only files  Initially supported file/file systems sizes up to 2TB (limitation of the block device layer)  Kernel version 2.6.17 (March 2006) extended max. file system size to 32TB (using 8kB Blocks)
  • 10. 10 FAT/MSDOS  Added to Linux in 1992/1993 by Werner Almesberger  VFAT support was later developed by Gordon Chaffee  VFAT filesystem is compatible with Windows 95/NT long filenames on the FAT filesystem  Initially called xmsdos  Patches for Linux 1.2.x and 1.3.x.  As of Linux 1.3.60, the vfat filesystem is part of the Linux kernel distribution  Mtools as a userland-only alternative
  • 11. 11 NTFS  NTFS driver for Linux by Martin von Löwis (started around 1996)  Legato Systems later sponsored Anton Altaparmakov to further develop NTFS on Linux since June 2001  Read-only mode only, with no fault-tolerance supported  NFTS-TNG replaced old NTFS driver in Linux 2.5.11 (April 29th, 2002)  NTFS-3G (FUSE-based) by Tuxera (read-write support)
  • 12. The Age of Journaling Filesystems
  • 13. 13 Fsck vs. Journaling  Unclean unmounts, too many mount counts, or remounts after a long time period triggered file system checks  Disk drives got bigger  A Journaling file system keeps track of changes not yet committed to the file system's main part in a Journal  Keep track of just metadata changes or data as well  Several file systems were developed in parallel, to alleviate this shortcoming of ext2, namely ext3, XFS, JFS and ReiserFS.
  • 14. 14 Journaling Block Device layer (JBD)  JBD established as a filesystem-independent service, to be used by any file system  First incarnation of JBD developed by Stephen C. Tweedie together with the ext3 file system  OCFS2 and later ext4 also used JBD and it’s successor JBD2
  • 15. 15 Third extended filesystem (ext3)  Originally released in September 1999  Written by Stephen Tweedie for the 2.2 branch  Ported to 2.4 kernels by Peter Braam, Andreas Dilger, Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie  Merged with the mainline Linux kernel 2.4.15 (November 2001)  Basically ext2 with journaling capabilities, easy conversion  Max filesystem size: 8TB, Max 32k subdirs/directory
  • 16. 16 IBM JFS  Rooted in AIX and OS/2 Warp Server (new design in 1995)  Port to Linux started in December 1999 (Dave Kleikamp, Steve Best)  Uses own journaling implementation (metadata only)  Max volume size: 32PB, Max file size: 4PB  Later ported to AIX 5L as JFS2 (April 2001)  JFS 0.0.1 released in Feb. 2000., 0.1.0 (Beta) in August 2000  Version 1.0.0 was released in June 2001  Kernel module since 2.4.18pre9-ac4, Version 1.1.0 was included by Marcelo Tosatti in Linux 2.4.20.
  • 17. 17 ReiserFS  Early supported by SuSE, Introduced in version 2.4.1 (2001)  The first journaling file system to be included in mainline  Max volume size: 16TB  Based on B+ trees  Metadata-only journaling (block journaling since 2.6.8)  Online resizing  Tail packing block suballocation  Reiser4 still under active development (Edward Shishkin)
  • 18. 18 SGI XFS  64-bit journaling file system created by Silicon Graphics  SGI IRIX since 1994, GPLed in 2000  Version 1.0 for Linux in May 2001 as Patch against 2.4.2  Merged in 2.6.x and 2.4.25 (Feb 2004)  Steve Lord, Russell Cattelan, Nathan Scott, Jim Mostek  Advanced features, high performance  Max volume size: 16EB
  • 20. 20 The need for Logical Volume Management  Initially, Linux could only address disks/partitions  Changes to the layout required downtime and shuffling of data  Logical Volume Management abstracts physical disk drives  First incarnation of Linux LVM was introduced in Kernel version 2.4  Heinz Mauelshagen wrote the original LVM code in 1998, inspired by HP-UX's volume manager.
  • 21. 21 Device Mapper (DM)  A kernel framework for mapping physical block devices onto higher- level virtual block devices  Added in Linux 2.6  Passes data from a virtual block device, which is provided by the device mapper itself, to another block device  Pluggable design  Data can be also modified in transition  Forms the foundation of LVM2/EVMS, RAID and dm-crypt disk encryption and many other useful features
  • 22. 22 DM Multipath (DM-MPIO)  Consists of kernel components and user-space components  Provides input-output (I/O) fail-over and load-balancing within Linux for block devices  Handles the rerouting of block I/O to an alternate path in the event of a path failure  Can also balance the I/O load across all of the available paths in Fibre Channel (FC) or iSCSI SAN environments  Started as part of a patchset created by Joe Thornber, later maintained by Alasdair G Kergon at Red Hat. Christophe Varoqui maintains the userland multipath tools
  • 23. 23 DM-Cache  Allows a fast device (e.g. an SSD) to be used as a cache for a slower device (e.g. a rotating disk)  Different policy plugins can be used to change the algorithms used to select which blocks are promoted, demoted, cleaned etc.  Supports writeback and writethrough modes  Requires three physical storage devices to separately store actual data, cache data and required metadata  Joe Thornber, Heinz Mauelshagen and Mike Snitzer  Inclusion into the Linux mainline kernel version 3.9, released on April 28, 2013
  • 24. 24 LVM2  Based on DM  Flexible storage management  Add/remove disks  Resize/move logical volumes  Move LVs between PVs  Span volumes across multiple physical devices  RAID  Thin provisioning  Cluster Volume Manager
  • 25. 25 IBM EVMS  IBM-sponsored effort to provide volume management services for Linux  A single, unified system for handling all storage management tasks  Despite many of the features and GUI management tools found in EVMS, LVM2 was preferred  As a result, IBM dropped their kernel driver and reworked their tools to work with LVM2 instead  Development stopped in 2006
  • 27. 27 NFS  Rick Sladkey original author of the NFS client and also ported the NFS server and the RPC library code. Doug Quale helped extending the kernel to support networking filesystems  NFS Version 2 since 1.2 kernel series  Kernel 2.2.18 a major milestone: mixing Linux NFS with other operating systems' NFS, use file locking reliably over NFS, and NFS Version 3.  NFS Versions 2, 3, and 4 are supported on 2.6 and later kernels. Version 4.1 (Client) at least kernel 2.6.31  NFSv4 for Linux has been under development at CITI and NetApp since 2001
  • 28. 28 Samba  A free-software re-implementation of the SMB/CIFS networking protocol  Andrew Tridgell started development of Samba in 1992, Jeremy Allison joined early on  Volker Lendecke founded SerNet in 1997, to provide commercial support  Version 3 (2003): file and print services for Microsoft Windows clients and can integrate with a Windows NT 4.0 server domain, either as a Primary Domain Controller (PDC) or as a domain member  Samba4 installations can act as an Active Directory domain controller or member server, at Windows 2008 domain and forest functional levels.
  • 29. 29 SMB vs.CIFS  SMB "server message block" and CIFS "common internet file system" are protocols. CIFS is the extension of the SMB protocol  “smbfs” was an older FS originated from the Samba project, heavily coupled with the Samba tools (smb.conf, smbmount, etc.). Removed in Linux 2.6.27  CIFS VFS was added to mainline Linux kernels in 2.5.42 Supports advanced network file system features such as locking, Unicode (advanced internationalization), hardlinks, dfs (hierarchical, replicated name space), distributed caching and uses native TCP names. All key network functions implemented in kernel
  • 31. 31 Fourth Extended Filesystem (ext4)  Advanced version of ext3, led by Ted Tso et al  Incorporated scalability and reliability enhancements for supporting large filesystems up to 1EB.  First experimental support for ext4 was merged into Linux 2.6.19, which was released on 29 November 2006.  Ext4 was marked as experimental until Linux 2.6.27  Starting with 2.6.28 (December 2008), ext4 was marked as stable  New extent format reduced metadata overhead (RAM, IO for access, transactions)
  • 32. 32 Btrfs  Chris Mason (Oracle) in 2007  COW (Snapshots)  Checksums, Compression  RAID, Volume management  Conversion of ext3/4 file systems  Merged into mainline Linux 2.6.29 (March 2009)  Florian Winkler talks about Btrfs today (11:15, HS7)
  • 33. 33 ZFS  Filesystem and logical volume manager combined  Designed and implemented at Sun Microsystems (Jeff Bonwick, Matthew Ahrens)  Development started in 2001,officially announced in 2004  128bit, COW, Snapshots, Deduplication, RAID  OpenSolaris (CDDL)  Early port based on FUSE  Kernel modules based OpenZFS (2013)  Not included in mainline Linux due to license incompatibilities
  • 35. 35 Network Block Device (NBD)  Remotely access a block device attached to another system  Userspace Server/Client, Client kernel module  Issues arise if network goes down or server crashes  Markus Pargmann talks about NBD on Sunday (16:30, HS6)
  • 36. 36 Distributed Replicated Block Device (DRBD)  A shared-nothing, synchronously replicated block device  “RAID1 over Network”  Writes to the primary node are transferred to the lower-level block device and simultaneously propagated to the secondary node  The secondary node then transfers data to its corresponding lower-level block device. All read I/O is performed locally  Fail-over capabilities (Secondary/Primary)  Lars Ellenberg and Philipp Reisner originally submitted code in July 2007  DRBD was merged on 8 December 2009 during the "merge window" for Linux kernel version 2.6.33
  • 38. 38 OCFS/OCFS2  Shared disk file system by Oracle  Main focus of OCFS was to accommodate Oracle clustered databases, not POSIX-compliant  OCFS2 designed as a Linux filesystem from scratch  On-disk filesystem implementation heavily inspired by ext3, uses JBD for journaling  OCFS2 integrated into version 2.6.16 of mainline Linux  Max Volume/File Size 4PB (currently limited to 16TB)  Trivia question: what feature do OCFS2 and Btrfs have in common?
  • 39. 39 GFS/GFS2  Shared disk filesystem, allows concurrent access to the same block storage  Development of GFS began in 1995 and was originally developed by University of Minnesota professor Matthew O'Keefe and a group of students  Originally for SGI IRIX, ported to Linux in 1998  Acquired by Sistina in 2000, turned into proprietary product  OpenGFS fork  Red Hat acquired Sistina in 2003 and released GFS2 under GPL in June 2004  GFS2 and the DLM merged into Linux 2.6.19 (29 November 2006)
  • 40. 40 Storage Requirements and Challenges  Amount of data to be stored grows exponentially  Today, Storage has to be:  Fault tolerant, reliable  Scalable without limitations or service interruptions  Distributable  Easy to manage / automate  Previous approaches do not address these requirements
  • 42. 42 GlusterFS  Aggregates various storage servers over Ethernet or Infiniband RDMA interconnect into one large parallel network file system  Storage bricks export local file systems as volumes  GlusterFS clients create composite virtual volumes from multiple remote servers using stackable „translators“  Translators provide Mirroring, Replication, Striping, etc.  Final volume mounted by client host using its own native protocol via FUSE, using NFS v3 protocol (via built-in server translator)  Originally developed by Gluster, Inc., which was acquired by Red Hat in 2011
  • 43. 43 Ceph  Initially created by Sage Weil, founded Inktank in 2012  First release in July 2012  Object, block, and file storage from a single distributed computer cluster  Reliable autonomic distributed object store (RADOS)  RADOS Block Device (RBD), Snapshots  RadosGW provides REST API (Amazon S3/OpenStack Swift)  Completely distributed without a single point of failure  Replicates data for fault tolerance (CRUSH)  Ceph client code was merged into mainling Linux version 2.6.34  Red Hat acquired Inktank in April 2014
  • 44. 44 Lustre  Parallel distributed file system, generally used for large-scale cluster computing  Widely used in TOP500 supercomputers  Max. volume size: 100 PB (production), over 16 EB (theoretical)  Max. file size: 2.5 PB (ext4), 16 EB (ZFS)  Started as a research project in 1999 by Peter Braam at CMU, who founded Cluster Filesystems Inc. in 2001 to work on Intermezzo, Coda and Lustre  First installed in March 2003 on the MCR Linux Cluster (Lawrence Livermore National Laboratory). Lustre 1.0.0 was released in December 2003.  Acquired by Sun Microsystems in 2007  Oracle acquired Sun in 2010 and discontinued the development  Whamcloud->Intel, OpenScalabaleFilesystems Inc. (OpenSFS), Xyratex Inc.
  • 45. 45 Shameless plug: openATTIC  Unified Storage: manage XFS, ZFS, Btrfs, NFS, Samba  Modern GUI (AngularJS/Boostrap)  REST API  Built-in Monitoring  Clustering (Pacemaker/Corosync, DRBD)  http://www.openattic.org/  Find us in the exhibition hall
  • 46. 46 PHP-ENTWICKLER (M/W) mit Linux Know-how Sie entwickeln leidenschaftlich gerne und fühlen sich im Open Source-Umfeld Zuhause? Dann sollten wir uns kennenlernen! Diese Aufgaben erwarten Sie bei uns… • Entwicklung unseres Systemmonitoring-Tools openITCOCKPIT für Frontend und/oder Backend • Konzeption und Realisierung von Projekten in Teamarbeit • Testing der entwickelten Anwendungen • Pflege und Ausbau der bestehenden Entwicklungs- und Testumgebung Weitere Informationen finden Sie unter: www.it-novum.com/karriere Gesucht: PHP-Entwickler (m/w) mit Linux Know-How