Benchmarking storage and metadata backends for cloud synchronization

•

1 like•291 views

The document summarizes testing of storage and metadata backends for a cloud synchronization framework called ClawIO. ClawIO aims to avoid synchronization between the metadata database and filesystem data, unlike most sync platforms. The authors benchmarked different metadata storage configurations including a local filesystem with MySQL, local filesystem with xattrs, and local filesystem with Redis and xattrs. Stat and upload operations using these configurations showed that storing metadata in memory databases improved performance and scalability over solely using the local filesystem. Future work includes implementing more backends like EOS, S3, and Swift and running additional benchmarks.

Internet

Testing storage and metadata backends
Hugo González Labrador, Arno Formella
LIA2, University of Vigo
CS3: Cloud Storage Services for Novel Applications and Workﬂows
Zürich, January 2016

Outline
• Origin of the project
• Architecture
• Storage backends
• Benchmark results
• Conclusion
• Outlook

• Cloud Synchronisation Benchmarking Framework
• Curiosity for testing new data and metadata backends that are novel
for synchronisation platforms.
• Flexible to plug your implementations in any language and technology
• Experiment with a new design that:
• Avoids synchronisation between the DB (containing the metadata)
and the filesystem (containing the data) that is done in the majority
of sync platforms using a local ﬁlesystem.
• Experience from being a Technical student at CERN working on the
CERNBox project.
Why did we develop ClawIO ?

META DATA META DATA META DATA
SYNC
(ownCloud Sync Protocol)
MGM
LOCAL FS XATTR/NOXATTR EOS S3/SWIFT/RADGOSGW
gRPC
HTTP
API REST
SHARE AUTH
CLI
Both
ownCloud CLIENTS 3rd PARTY APPSTECHNICAL USERS
Architecture

Data
Metadata used by ownCloud
Metadata Key Metadata Value
CHECKSUM md5:8c8d357b5e872bbacd45197
626bd5759
MTIME 104857600
PATH /local/users/d/demo/ﬁle
FILEID a8584c90-ae2a-4dd8-84a7-
f18ced109cce
ETAG 956494f8-5120-4165-afba-
ad5f8d13b8ef
What metadata do we keep ?

META UNIT
DATA UNIT
Ext4 FS
DATA
FILEID
CHECKSUM
MTIME
ETAG PATH
SETUP ONE: Local FS with MySQL as the metadata store

META UNIT
DATA UNIT
Ext4 FS XATTR
DATA
FILEID
CHECKSUM
FILEID
CHECKSUM
MTIME
ETAG PATH
SETUP TWO: Local FS with MySQL and XATTRs as metadata stores

META UNIT
DATA UNIT
Ext4 FS XATTR
DATA
FILEID
CHECKSUM
FILEID
CHECKSUM
MTIME
ETAG PATH
SETUP THREE: Local FS with REDIS and XATTRs as metadata stores

CPU
64 cores Intel(R) Xeon(R) CPU E5-4640 v2 @
2.20GHz
RAM 64 GB
DISK SAS-3 12 Gb/s 4 TB Seagate Constellation
Enterprise ST4000NM3401 (RAID6)
VM Machine Specs, Deployment scenario and Operations to benchmark
Deployment
Operations
16 services (servers) on the same VM
bench client also on the same VM
Stat Upload
VM
infrastructure provided by

• The STAT operation is similar to a Unix ﬁle stat operation or a
webDAV PROPFIND.
• The objective of this operation is to retrieve the metadata
associated with a particular resource (ﬁle or folder)
• For each level of concurrency, 10 000 requests are triggered
and the test is repeated 5 times.
• Operation uses gRPC and HTTP/2.
STAT benchmark

• The upload operation uploads a ﬁle randomly
chosen from a ﬁxed set of 100 ﬁles that follow the
distribution of ﬁles observed on CERNBox.
• The chosen ﬁle is uploaded 5000 times per
concurrency level, to random target destinations
to avoid overwrites. The benchmark is repeated 5
times.
• Operation uses HTTP/1.1
• Upload triggers metadata propagation.
UPLOAD benchmark

• Retrieval of ﬁle hierarchy from FS favours novel
uses cases and access to existing data
repositories.
• In-memory databases increase the performance
and can scale to a high number or records  
(with a 70 bytes memory footprint per ﬁle, 
64 GiB => 981714285714 ﬁles)
• Use of XATTRS makes the system more consistent
What can we extract from these results ?

WE THINK WE ARE ON THE RIGHT WAY
Piotr’s Analytics System

What comes next ?
• Improve performance (more parallelisation)
• Run more benchmarks: upload with checksums, overwrites, remove, move, fuzz
• Perform benchmark on cluster (ClawIO design scales out)
• Implement more backends: EOS, S3/SWIFT/RADOSGW 
(plug your backend, suggestions are welcome)
• Implement sharing and run benchmarks on shared folders
• Test other Sync Protocols: SeaFile, StorageSync ?

Acknowledgements
• Centro de Investigación, Transferencia e Innovación
(CITI)
• PhD Jakub T. Mościcki @CERN
• Piotr Mrowczynski @Technical University of Denmark
• LIA2, University of Vigo

What's hot

Cloud computing and bioinformaticsEnis Afgan

Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Belmiro Moreira

Apache StormNguyen Quang

Apache Storm InternalsHumoyun Ahmedov

Scaling Apache Storm (Hadoop Summit 2015)Robert Evans

Moving to Nova Cells without Destroying the WorldMike Dorman

OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...NETWAYS

Build a Complex, Realtime Data Management App with Postgres 14!Jonathan Katz

Resource Aware Scheduling in Apache StormDataWorks Summit/Hadoop Summit

Open stack neutron and opendaylightramgow

Hpc to OpenStack: Our journeyArif Ali

Pushing Python: Building a High Throughput, Low Latency SystemKevin Ballard

Apache Storm based Real Time Analytics for Recommending Trending Topics and S...Humoyun Ahmedov

Multi-Tenant Storm Service on Hadoop GridDataWorks Summit

Stormnathanmarz

Disaggregating Ceph using NVMeoFZoltan Arnold Nagy

ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019UA DevOps Conference

2018년 3월 정기 세미나 - March 2018 Ops Meetup 후기OpenStack Korea Community

Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiCeph Community

Elks for analysing performance test results - Helsinki QA meetupAnoop Vijayan

What's hot (20)

Cloud computing and bioinformatics

Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015

Apache Storm

Apache Storm Internals

Scaling Apache Storm (Hadoop Summit 2015)

Moving to Nova Cells without Destroying the World

OpenNebula Conf 2014 | Lightning talk: OpenNebula Puppet Module - Norman Mess...

Build a Complex, Realtime Data Management App with Postgres 14!

Resource Aware Scheduling in Apache Storm

Open stack neutron and opendaylight

Hpc to OpenStack: Our journey

Pushing Python: Building a High Throughput, Low Latency System

Apache Storm based Real Time Analytics for Recommending Trending Topics and S...

Multi-Tenant Storm Service on Hadoop Grid

Storm

Disaggregating Ceph using NVMeoF

ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019

2018년 3월 정기 세미나 - March 2018 Ops Meetup 후기

Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui

Elks for analysing performance test results - Helsinki QA meetup

Similar to Benchmarking storage and metadata backends for cloud synchronization

Ceph Day San Jose - Object Storage for Big Data Ceph Community

InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxData

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme

Data Science with the Help of MetadataJim Dowling

Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Databricks

InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData

Machine Learning With H2O vs SparkMLArnab Biswas

Scalable Preservation WorkflowsSCAPE Project

Hadoop introductionmusrath mohammad

AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services

OCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdfOCRE | Open Clouds for Research Environments

Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex

Data Automation at Light SourcesIan Foster

Ceph used in Cancer Research at OICRCeph Community

WarsawITDays_ ApacheNiFi202Timothy Spann

HadoopArchana Gopinath

Apache Big Data Europe 2015: Selected TalksAndrii Gakhov

HPC and cloud distributed computing, as a journeyPeter Clapham

Alfresco benchmark report_bl100093ECNU

Data provenance in HopsworksAlexandru Adrian Ormenisan

Similar to Benchmarking storage and metadata backends for cloud synchronization (20)

Ceph Day San Jose - Object Storage for Big Data

InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...

Data Science with the Help of Metadata

Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...

InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard

Machine Learning With H2O vs SparkML

Scalable Preservation Workflows

Hadoop introduction

AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...

OCRE webinar - April 14 - Cloud_Validation_Suite_Ignacio Peluaga Lozada.pdf

Intro to Apache Apex - Next Gen Platform for Ingest and Transform

Data Automation at Light Sources

Ceph used in Cancer Research at OICR

WarsawITDays_ ApacheNiFi202

Hadoop

Apache Big Data Europe 2015: Selected Talks

HPC and cloud distributed computing, as a journey

Alfresco benchmark report_bl100093

Data provenance in Hopsworks

Recently uploaded

AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12

Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.soniya singh

VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya

'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC

Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh

Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls

VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023

Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey

Russian Call Girls in %(+971524965298 )# Call Girls in DubaiDubai call girls 971524965298 Call girls in Bur Dubai

Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh

Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.soniya singh

WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)Delhi Call girls

𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey

(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls

Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe

INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.CarlotaBedoya1

Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi

Recently uploaded (20)

AWS Community DAY Albertini-Ellan Cloud Security (1).pptx

Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.

VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl

'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...

Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝

Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service

VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...

Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...

Russian Call Girls in %(+971524965298 )# Call Girls in Dubai

Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance

Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.

Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.

WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)

𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...

(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...

Moving Beyond Twitter/X and Facebook - Social Media for local news providers

INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.

Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝

Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...

All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445

Benchmarking storage and metadata backends for cloud synchronization

1. Testing storage and metadata backends Hugo González Labrador, Arno Formella LIA2, University of Vigo CS3: Cloud Storage Services for Novel Applications and Workﬂows Zürich, January 2016

2. Outline • Origin of the project • Architecture • Storage backends • Benchmark results • Conclusion • Outlook

3. • Cloud Synchronisation Benchmarking Framework • Curiosity for testing new data and metadata backends that are novel for synchronisation platforms. • Flexible to plug your implementations in any language and technology • Experiment with a new design that: • Avoids synchronisation between the DB (containing the metadata) and the filesystem (containing the data) that is done in the majority of sync platforms using a local ﬁlesystem. • Experience from being a Technical student at CERN working on the CERNBox project. Why did we develop ClawIO ?

4. META DATA META DATA META DATA SYNC (ownCloud Sync Protocol) MGM LOCAL FS XATTR/NOXATTR EOS S3/SWIFT/RADGOSGW gRPC HTTP API REST SHARE AUTH CLI Both ownCloud CLIENTS 3rd PARTY APPSTECHNICAL USERS Architecture

5. Data Metadata used by ownCloud Metadata Key Metadata Value CHECKSUM md5:8c8d357b5e872bbacd45197 626bd5759 MTIME 104857600 PATH /local/users/d/demo/ﬁle FILEID a8584c90-ae2a-4dd8-84a7- f18ced109cce ETAG 956494f8-5120-4165-afba- ad5f8d13b8ef What metadata do we keep ?

6. META UNIT DATA UNIT Ext4 FS DATA FILEID CHECKSUM MTIME ETAG PATH SETUP ONE: Local FS with MySQL as the metadata store

7. META UNIT DATA UNIT Ext4 FS XATTR DATA FILEID CHECKSUM FILEID CHECKSUM MTIME ETAG PATH SETUP TWO: Local FS with MySQL and XATTRs as metadata stores

8. META UNIT DATA UNIT Ext4 FS XATTR DATA FILEID CHECKSUM FILEID CHECKSUM MTIME ETAG PATH SETUP THREE: Local FS with REDIS and XATTRs as metadata stores

9. CPU 64 cores Intel(R) Xeon(R) CPU E5-4640 v2 @ 2.20GHz RAM 64 GB DISK SAS-3 12 Gb/s 4 TB Seagate Constellation Enterprise ST4000NM3401 (RAID6) VM Machine Specs, Deployment scenario and Operations to benchmark Deployment Operations 16 services (servers) on the same VM bench client also on the same VM Stat Upload VM infrastructure provided by

10. • The STAT operation is similar to a Unix ﬁle stat operation or a webDAV PROPFIND. • The objective of this operation is to retrieve the metadata associated with a particular resource (ﬁle or folder) • For each level of concurrency, 10 000 requests are triggered and the test is repeated 5 times. • Operation uses gRPC and HTTP/2. STAT benchmark

11. STAT benchmark

12. STAT benchmark

13. STAT benchmark

14. • The upload operation uploads a file randomly chosen from a fixed set of 100 files that follow the distribution of files observed on CERNBox. • The chosen file is uploaded 5000 times per concurrency level, to random target destinations to avoid overwrites. The benchmark is repeated 5 times. • Operation uses HTTP/1.1 • Upload triggers metadata propagation. UPLOAD benchmark

18. CONCLUSIONS

20. UPLOAD COMPARISON

21. • Retrieval of file hierarchy from FS favours novel uses cases and access to existing data repositories. • In-memory databases increase the performance and can scale to a high number or records   (with a 70 bytes memory footprint per file,  64 GiB => 981714285714 files) • Use of XATTRS makes the system more consistent What can we extract from these results ?

22. WE THINK WE ARE ON THE RIGHT WAY Piotr’s Analytics System

23. What comes next ?

24. What comes next ? • Improve performance (more parallelisation) • Run more benchmarks: upload with checksums, overwrites, remove, move, fuzz • Perform benchmark on cluster (ClawIO design scales out) • Implement more backends: EOS, S3/SWIFT/RADOSGW  (plug your backend, suggestions are welcome) • Implement sharing and run benchmarks on shared folders • Test other Sync Protocols: SeaFile, StorageSync ?

25. Thank you ¿Q&A?

26. Acknowledgements • Centro de Investigación, Transferencia e Innovación (CITI) • PhD Jakub T. Mościcki @CERN • Piotr Mrowczynski @Technical University of Denmark • LIA2, University of Vigo

Benchmarking storage and metadata backends for cloud synchronization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Benchmarking storage and metadata backends for cloud synchronization

Similar to Benchmarking storage and metadata backends for cloud synchronization (20)

Recently uploaded

Recently uploaded (20)

Benchmarking storage and metadata backends for cloud synchronization