Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Directory Write Leases in MagFS

3,117 views

Published on

Published in: Engineering
  • Login to see the comments

Directory Write Leases in MagFS

  1. 1. Directory Write Leases in MagFS Deepti Chheda, Staff Engineer @Maginatics Nate Rosenblum, Architect @Maginatics © mekuria getinet / www.mekuriageti.net
  2. 2. Maginatics Cloud Storage Platform Maginatics Cloud Storage Platform  Strongly consistent  Geo-distributed  Secure  Mobile-enabled  Layered on object stores  POSIX compliant ✗ Storage Gateway ✗ SMB/NFS compatible
  3. 3. Object Storage (public, on-premises, or hybrid) Maginatics File System Metadata * Data Metadata Servers MagFS Clients *MagFS proprietary WAN-optimized protocol
  4. 4. Object Storage (public, on-premises, or hybrid) Metadata Data Metadata Server MagFS Clients
  5. 5. Problems with geo-distributed file systems synchronous calls WAN link latencies unsuitable for global distribution enforcing consistency
  6. 6. How do traditional network file systems alleviate this problem?
  7. 7. leases / caching changes later propagated to server Hiding latencies in network file systems clients serve reads & writes locally • Performance improvements • Strong consistency guarantees
  8. 8. Read Lease • File reads • Directory enumeration • Metadata & file attributes • Shared Write Lease • File modifications • Exclusive, single-writer Handle Lease • Open handles after application has closed them • Shared Example: SMB Leases
  9. 9. Strong Consistency
  10. 10. SMB Valid Leases Read Lease Handle Lease Write Lease File ✔ ✔ ✔ Directory ✔ ✔ ✗
  11. 11. Common FS operations optimized Read Lease Handle Lease Write Lease File read() stat() open() close() write() Directory readdir() stat() opendir() close()
  12. 12. Namespace modifying operations? create() mkdir() rename() unlink() rmdir() chmod() Synchronous ops => incur a network RTT !
  13. 13. 4:19 3:50 3:21 2:52 2:24 1:55 1:26 0:57 0:28 0:00 Create and delete workload 5msec 50msec 100msec 150msec Time in hours Network RTT SMB
  14. 14. Can we safely delegate namespace modifying operations to clients?
  15. 15. 4:19 3:50 3:21 2:52 2:24 1:55 1:26 0:57 0:28 0:00 Create & delete workload 5msec 50msec 100msec 150msec Time in hours Network RTT SMB MagFS
  16. 16. Directory Write Leases (DWL)
  17. 17. Semantics
  18. 18. MagFS Lease states Read Lease Handle Lease Write Lease File ✔ ✔ ✔ Directory ✔ ✔ ✔
  19. 19. File Write Lease Gives authority over a single file Exclusive, single-writer Client can cache file modifications locally Must flush dirty data on lease break Dir Write Lease Gives authority over single directory (not subtree!) Exclusive, single-writer Client can cache namespace-modifying ops in that directory Must replay directory modifications on lease break
  20. 20. Lease grant conditions • Client must request DWL on the directory • When to issue? • Detect pattern and request lease upgrade in background • Exclusivity is • No other client has opens on this directory AND its children
  21. 21. home user1 foo bar user2 file Home directory use-case
  22. 22. home user1 foo bar user2 file Home directory use-case
  23. 23. home user1 foo bar user2 file Home directory use-case baz
  24. 24. home user1 quux bar user2 file Home directory use-case baz
  25. 25. Lease break semantics • Server must issue a lease break when another client tries to: • Open this directory • Open anywhere in this sub-tree • Rename into this directory • Client must drain all pending ops on this directory, AND on all children in that directory
  26. 26. Directory Write Lease break Client Client1 Server 2 Create(user1, bar) P1 Create(bar, baz) Open(user1) P2 Rename(foo, quux) P3 Lease Break (RWH->RH) Handle (RH) Open(user1) Handle (RWH) P1 P2 P3 + ACK
  27. 27. [client transition slide] Client support for Directory Write Leases
  28. 28. Client responsibilities File system consistency semantics Security, integrity, correct behavior of local file system operations Performance
  29. 29. Lifetime of DWL pattern detection / trigger write-behind opportunistic replay full queue lease break forced replay
  30. 30. Limits of burst performance full queue in-queue dependencies
  31. 31. Operational dependencies $ mkdir foo $ touch foo/bar{1,2,3} $ mkdir foo/baz $ rm foo/bar1 $ mv foo quux
  32. 32. Exploiting parallelism $ mkdir foo $ touch foo/bar{1,2,3} $ mkdir foo/baz $ rm foo/bar1 $ mv foo quux
  33. 33. Replaying operations faster
  34. 34. minimize Uncommitted operations operations reported complete but at risk t0 tk tn application-visible completion durable / committed operations
  35. 35. Results
  36. 36. ●● ● ● ● ● ●●● ● ● ● 150 100 50 0 0 10 20 30 40 50 network latency (ms) duration (s) ● ● sync async Finding workload parallelism # Populate a subtree # for i in {1..10}; do mkdir p${i} for j in {1..100}; do touch p${i}/f${j} done done
  37. 37. Mixed data + metadata: extracting archives ● ●● ● ● ● ●●● ● ● ● 400 300 200 100 0 0 10 20 30 40 50 network latency (ms) duration (s) ● ● sync async. target: openssl-1.0.1i.tar.gz Combines namespace mutation + data operations Intractable over WAN for even modest archives
  38. 38. ●●● ● ● ● ●● ● ● ● ●● ● ● ● ● 600 400 200 0 0 10 20 30 40 50 network latency (ms) duration (s) ● ● ● sync async smb Mixed data + metadata: extracting archives target: openssl-1.0.1i.tar.gz Combines namespace mutation + data operations Intractable over WAN for even modest archives
  39. 39. Multi-phase workloads (building OpenSSL) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1000 750 500 250 0 untar config build clean untar config build clean untar config build clean untar config build clean duration (s) latency ● ● ● ● 0.5 5 10 50 ● async sync
  40. 40. Directory write leases in MagFS Wan-optimized file system for global enterprise Directory write leases delegate namespace responsibility to clients ● ●● ● ● ● ●●● ● ● ● 400 300 200 100 0 0 10 20 30 40 50 network latency (ms) duration (s) ● ● sync async. Leasing helps performance scale with latency
  41. 41. Try MagFS at http://maginatics.com
  42. 42. Backup
  43. 43. Simple extension: compounding
  44. 44. Advanced optimization: cancellation collapsing redundant operation mv foo bar ; mv bar baz operation cancellation touch foo ; rm foo
  45. 45. Simple dependency graph
  46. 46. Extract OpenSSL dependency graph * tiny fraction of dependency graph

×