SlideShare a Scribd company logo
1 of 24
Download to read offline
Towards a Scalable File System
Progress on adapting BlobSeer to WAN scale
for the HGMDS distributed metadata system

Viet-Trung Tran, Gabriel Antoniu, Alexandru Costan (INRIA - Rennes)
In collaboration with Kohei Hiraga, Osamu Tatebe (U Tsukuba)



FP3C meeting
Bordeaux, 2 – 3 September 2011
Plan

1. Background and context
2. Goal
3. Approach and solution
4. Preliminary evaluation
5. Conclusion




FP3C meeting – Bordeaux, 2-3 September 2011   -2
1
Background
BlobSeer & HGMDS




FP3C meeting – Bordeaux, 2-3 September 2011   -3
BlobSeer: A large-scale data management
service
Generic data-management platform for huge, unstructured data
•  Huge data (TB) : BLOBs
•  Highly concurrent, fine-grain access (MB): R/W/A
•  Prototype available

Key design features
•  Decentralized metadata management
•  Beyond MVCC: multiversioning exposed to the user
•  Lock-free write access through versioning

A back-end for higher-level, sophisticated data management systems




FP3C meeting – Bordeaux, 2-3 September 2011                          -4
BlobSeer: Architecture

Clients                                                    Providers
•  Perform fine grain blob accesses
Providers
•  Store the pages of the blob
Provider manager
•  Monitors the providers
•  Favours data load balancing                             Provider
                                         Clients           manager
Metadata providers
•  Store information about page location               Version
Version manager                                        manager
•  Ensures concurrency control




                                                   Metadata providers


FP3C meeting – Bordeaux, 2-3 September 2011                           -5
HGMDS: A distributed metadata
management system for global file systems

•  Multi-master file system
                                                                 The	
  Internet	
metadata server (MDS).
                                                    Site A	
                         Site B	
•  Managing inode structure.                  File system Clients	
•  High latency networks don't
affect metadata operation
                                                               HGMD                             HGMD
performance.                                                                                    S	
                                                               S	
      - Both reading and writing.
•  One MDS per site.
•  Metadata versioning using                   mkdir/rmdir/                           Propagate
                                               create/stat/                           updates in
vector clocks for collision                       unlink 	
                          background
detection.                                                            Site C	

•  Automatic collision resolution
by system side.

FP3C meeting – Bordeaux, 2-3 September 2011                                                       -6
2
Goal
A joint architecture integrating BlobSeer and HGMDS




FP3C meeting – Bordeaux, 2-3 September 2011       -7
Goal
                BlobSeer                                HGMDS
   Data management                            Metadata management
   Typically on a single site                 Global scale, multiple sites




Idea: build a global file system deployed on multiple site by integrating
BlobSeer to HGMDS

Potential benefits:
•  HGMDS: efficient multi-site file metadata management
•  BlobSeer: concurrency-optimized access to globally shared data




FP3C meeting – Bordeaux, 2-3 September 2011                                  -8
3
Our approach and solution




FP3C meeting – Bordeaux, 2-3 September 2011   -9
Two approaches

Multiple BlobSeer instances
•  One BlobSeer / site



One single BlobSeer-WAN over distributed geographic
sites




FP3C meeting – Bordeaux, 2-3 September 2011       - 10
1st approach: 1 BlobSeer instance / site




                        Client




FP3C meeting – Bordeaux, 2-3 September 2011   - 11
1st approach: Zoom




High latency when accessing remote BLOBs:
•  Too many remote requests for small metadata
EMETTEUR - NOM DE LA PRESENTATION                - 12
2nd approach: 1 BlobSeer-WAN instance
over distributed geographic sites

Multiple version managers
•  1 version manager/site
Multiple provider managers
•  1 provider manager/site


On each site
•  Multiple data providers and metadata servers
•  Data providers are under control of local provider manager




EMETTEUR - NOM DE LA PRESENTATION                               - 13
Idea: leverage locality
for remote metadata accesses




                         2




Metadata I/O is resolved locally
EMETTEUR - NOM DE LA PRESENTATION   - 14
2nd approach: I/O scheme in BlobSeer-WAN

Writing
•  Publish version on local version manager
•  Locally write metadata on local metadata servers
•  Locally write data on local data providers


Reading (Read your write in many cases)
•  Ask a version to local version manager
•  Local metadata accesses
•  Access remote/local providers if necessary




FP3C meeting – Bordeaux, 2-3 September 2011           - 15
Vector clocks and optimistic metadata
replication




FP3C meeting – Bordeaux, 2-3 September 2011   - 16
Expected benefits

•  On WAN: BlobSeer coordinates with HGMDS to provide a
   global versioning file system
     - Low latency metadata I/O
     - Eventually consistency model
    - Load balancing/fault tolerance
•  On LAN:
     - Distributed version management
     - Load balancing/fault tolerance




FP3C meeting – Bordeaux, 2-3 September 2011               - 17
4
Preliminary evaluation
BlobSeer-WAN on G5K




FP3C meeting – Bordeaux, 2-3 September 2011   - 18
Testbed

Using 2 sites of G5K
•  Rennes: 40 nodes
     • 30 nodes reserved for BlobSeer services
     • 10 nodes for clients
•  Grenoble: 40 nodes
    • 30 nodes reserved for BlobSeer services
     • 10 nodes for clients
•  Interconnect network between sites 10 Gbps




FP3C meeting – Bordeaux, 2-3 September 2011      - 19
Concurrent appending: 512 MB/client




FP3C meeting – Bordeaux, 2-3 September 2011   - 20
5
Conclusion
On going work




FP3C meeting – Bordeaux, 2-3 September 2011   - 21
Summary
Discussed the integration of BlobSeer and HGMDS:
•  BlobSeer-WAN extension is required


BlobSeer-WAN
•  Preliminary results look encouraging
•  Performance of BlobSeer-WAN on two sites similar to that of
   vanilla BlobSeer on a single site
•  Prototype available at BlobSeer’s repository/branches/
  BlobSeer-WAN-dev/


HGMDS
•  Implementation almost done
•  Works on multi-sites
•  Collisions automatically solved by a rule
FP3C meeting – Bordeaux, 2-3 September 2011                  - 22
Next steps

•  A more extensive evaluation for BlobSeer-WAN
•  Integrate BlobSeer-WAN to HGMDS
•  Preliminary evaluation of HGMDS BlobSeer-WAN on
   Grid5000 and on the Japanese Clusters
•  Submit co-authored paper by Spring 2012
•  Next internships: Kohei @Inria Rennes




FP3C meeting – Bordeaux, 2-3 September 2011          - 23
Thank you!




    FP3C meeting
    2 – 3 September 2011

More Related Content

Viewers also liked

EY O viziune a cresterii - editia de toamna 2016
EY O viziune a cresterii - editia de toamna 2016EY O viziune a cresterii - editia de toamna 2016
EY O viziune a cresterii - editia de toamna 2016Mihaela Matei
 
Operation india is my country project or mission aadhaar by www.indiaismycoun...
Operation india is my country project or mission aadhaar by www.indiaismycoun...Operation india is my country project or mission aadhaar by www.indiaismycoun...
Operation india is my country project or mission aadhaar by www.indiaismycoun...DantuBhaskar
 
Présentation du Réseau Numérique & Agriculture de l'ACTA
Présentation du Réseau Numérique & Agriculture de l'ACTAPrésentation du Réseau Numérique & Agriculture de l'ACTA
Présentation du Réseau Numérique & Agriculture de l'ACTAAPI-AGRO
 
.NETクロスプラットフォーム
.NETクロスプラットフォーム.NETクロスプラットフォーム
.NETクロスプラットフォームYasushi Kato
 
Use of Big Data Analytics in Advertising
Use of Big Data Analytics in AdvertisingUse of Big Data Analytics in Advertising
Use of Big Data Analytics in AdvertisingSandesh Patkar
 
Ipsos MORI Scotland: Public Opinion Monitor June 2016
Ipsos MORI Scotland: Public Opinion Monitor June 2016Ipsos MORI Scotland: Public Opinion Monitor June 2016
Ipsos MORI Scotland: Public Opinion Monitor June 2016Ipsos UK
 
Pubcon Las Vegas 2016 - The intersection of SEO & CRO
Pubcon Las Vegas 2016 - The intersection of SEO & CROPubcon Las Vegas 2016 - The intersection of SEO & CRO
Pubcon Las Vegas 2016 - The intersection of SEO & CROAnt Robinson
 
บทที่ 4 การอ่านตีความ
บทที่ 4 การอ่านตีความบทที่ 4 การอ่านตีความ
บทที่ 4 การอ่านตีความAj.Mallika Phongphaew
 

Viewers also liked (12)

EY O viziune a cresterii - editia de toamna 2016
EY O viziune a cresterii - editia de toamna 2016EY O viziune a cresterii - editia de toamna 2016
EY O viziune a cresterii - editia de toamna 2016
 
Operation india is my country project or mission aadhaar by www.indiaismycoun...
Operation india is my country project or mission aadhaar by www.indiaismycoun...Operation india is my country project or mission aadhaar by www.indiaismycoun...
Operation india is my country project or mission aadhaar by www.indiaismycoun...
 
Présentation du Réseau Numérique & Agriculture de l'ACTA
Présentation du Réseau Numérique & Agriculture de l'ACTAPrésentation du Réseau Numérique & Agriculture de l'ACTA
Présentation du Réseau Numérique & Agriculture de l'ACTA
 
WideNet U: How To Write Well
WideNet U: How To Write WellWideNet U: How To Write Well
WideNet U: How To Write Well
 
.NETクロスプラットフォーム
.NETクロスプラットフォーム.NETクロスプラットフォーム
.NETクロスプラットフォーム
 
Use of Big Data Analytics in Advertising
Use of Big Data Analytics in AdvertisingUse of Big Data Analytics in Advertising
Use of Big Data Analytics in Advertising
 
Understanding the 2016 Budget Outlook
Understanding the 2016 Budget OutlookUnderstanding the 2016 Budget Outlook
Understanding the 2016 Budget Outlook
 
Dynamic web 7
Dynamic web 7Dynamic web 7
Dynamic web 7
 
Ipsos MORI Scotland: Public Opinion Monitor June 2016
Ipsos MORI Scotland: Public Opinion Monitor June 2016Ipsos MORI Scotland: Public Opinion Monitor June 2016
Ipsos MORI Scotland: Public Opinion Monitor June 2016
 
Como utilizar google scholar para mejorar la visibilidad de nuestra produccio...
Como utilizar google scholar para mejorar la visibilidad de nuestra produccio...Como utilizar google scholar para mejorar la visibilidad de nuestra produccio...
Como utilizar google scholar para mejorar la visibilidad de nuestra produccio...
 
Pubcon Las Vegas 2016 - The intersection of SEO & CRO
Pubcon Las Vegas 2016 - The intersection of SEO & CROPubcon Las Vegas 2016 - The intersection of SEO & CRO
Pubcon Las Vegas 2016 - The intersection of SEO & CRO
 
บทที่ 4 การอ่านตีความ
บทที่ 4 การอ่านตีความบทที่ 4 การอ่านตีความ
บทที่ 4 การอ่านตีความ
 

Similar to Scalable File System for WAN with BlobSeer and HGMDS

Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data PersistenceFIWARE
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWSMongoDB
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Antonio Cesarano
 
Towards A Grid File System Based On A Large-Scale BLOB Management Service
Towards A Grid File System Based On A Large-Scale BLOB Management ServiceTowards A Grid File System Based On A Large-Scale BLOB Management Service
Towards A Grid File System Based On A Large-Scale BLOB Management ServiceViet-Trung TRAN
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
 
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDB
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDBMongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDB
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDBMongoDB
 
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineMeteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineKit Merker
 
Node.js BFFs: our way to better/micro frontends
Node.js BFFs: our way to better/micro frontendsNode.js BFFs: our way to better/micro frontends
Node.js BFFs: our way to better/micro frontendsEugene Fidelin
 
FILES IN TODAY’S WORLD - #MFSummit2017
FILES IN TODAY’S WORLD - #MFSummit2017FILES IN TODAY’S WORLD - #MFSummit2017
FILES IN TODAY’S WORLD - #MFSummit2017Micro Focus
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2hdhappy001
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB InternalsSiraj Memon
 
A Taste Of InfoGrid
A Taste Of InfoGridA Taste Of InfoGrid
A Taste Of InfoGridInfoGrid.org
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
Windows Server 2012 R2 Jump Start - Intro
Windows Server 2012 R2 Jump Start - IntroWindows Server 2012 R2 Jump Start - Intro
Windows Server 2012 R2 Jump Start - IntroPaulo Freitas
 
[WSO2Con USA 2018] Up-leveling Brownfield Integration
[WSO2Con USA 2018] Up-leveling Brownfield Integration [WSO2Con USA 2018] Up-leveling Brownfield Integration
[WSO2Con USA 2018] Up-leveling Brownfield Integration WSO2
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 

Similar to Scalable File System for WAN with BlobSeer and HGMDS (20)

Strategies for Context Data Persistence
Strategies for Context Data PersistenceStrategies for Context Data Persistence
Strategies for Context Data Persistence
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWS
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Towards A Grid File System Based On A Large-Scale BLOB Management Service
Towards A Grid File System Based On A Large-Scale BLOB Management ServiceTowards A Grid File System Based On A Large-Scale BLOB Management Service
Towards A Grid File System Based On A Large-Scale BLOB Management Service
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDB
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDBMongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDB
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDB
 
Google
GoogleGoogle
Google
 
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container EngineMeteor South Bay Meetup - Kubernetes & Google Container Engine
Meteor South Bay Meetup - Kubernetes & Google Container Engine
 
Node.js BFFs: our way to better/micro frontends
Node.js BFFs: our way to better/micro frontendsNode.js BFFs: our way to better/micro frontends
Node.js BFFs: our way to better/micro frontends
 
FILES IN TODAY’S WORLD - #MFSummit2017
FILES IN TODAY’S WORLD - #MFSummit2017FILES IN TODAY’S WORLD - #MFSummit2017
FILES IN TODAY’S WORLD - #MFSummit2017
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
A Taste Of InfoGrid
A Taste Of InfoGridA Taste Of InfoGrid
A Taste Of InfoGrid
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Windows Server 2012 R2 Jump Start - Intro
Windows Server 2012 R2 Jump Start - IntroWindows Server 2012 R2 Jump Start - Intro
Windows Server 2012 R2 Jump Start - Intro
 
[WSO2Con USA 2018] Up-leveling Brownfield Integration
[WSO2Con USA 2018] Up-leveling Brownfield Integration [WSO2Con USA 2018] Up-leveling Brownfield Integration
[WSO2Con USA 2018] Up-leveling Brownfield Integration
 
Tim Marston.
Tim Marston.Tim Marston.
Tim Marston.
 
Tim marston
Tim marstonTim marston
Tim marston
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 

More from Viet-Trung TRAN

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Viet-Trung TRAN
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreViet-Trung TRAN
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnViet-Trung TRAN
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processingViet-Trung TRAN
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookViet-Trung TRAN
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studyViet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkViet-Trung TRAN
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learningViet-Trung TRAN
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposalsViet-Trung TRAN
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents Viet-Trung TRAN
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Viet-Trung TRAN
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Viet-Trung TRAN
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 

More from Viet-Trung TRAN (20)

Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
Bắt đầu tìm hiểu về dữ liệu lớn như thế nào - 2017
 
Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value StoreDynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store
 
Pregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớnPregel: Hệ thống xử lý đồ thị lớn
Pregel: Hệ thống xử lý đồ thị lớn
 
Mapreduce simplified-data-processing
Mapreduce simplified-data-processingMapreduce simplified-data-processing
Mapreduce simplified-data-processing
 
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của FacebookTìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
Tìm kiếm needle trong Haystack: Hệ thống lưu trữ ảnh của Facebook
 
giasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case studygiasan.vn real-estate analytics: a Vietnam case study
giasan.vn real-estate analytics: a Vietnam case study
 
Giasan.vn @rstars
Giasan.vn @rstarsGiasan.vn @rstars
Giasan.vn @rstars
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 
Large-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on SparkLarge-Scale Geographically Weighted Regression on Spark
Large-Scale Geographically Weighted Regression on Spark
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
success factors for project proposals
success factors for project proposalssuccess factors for project proposals
success factors for project proposals
 
GPSinsights poster
GPSinsights posterGPSinsights poster
GPSinsights poster
 
OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents OCR processing with deep learning: Apply to Vietnamese documents
OCR processing with deep learning: Apply to Vietnamese documents
 
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015Introduction to BigData @TCTK2015
Introduction to BigData @TCTK2015
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Scalable File System for WAN with BlobSeer and HGMDS

  • 1. Towards a Scalable File System Progress on adapting BlobSeer to WAN scale for the HGMDS distributed metadata system Viet-Trung Tran, Gabriel Antoniu, Alexandru Costan (INRIA - Rennes) In collaboration with Kohei Hiraga, Osamu Tatebe (U Tsukuba) FP3C meeting Bordeaux, 2 – 3 September 2011
  • 2. Plan 1. Background and context 2. Goal 3. Approach and solution 4. Preliminary evaluation 5. Conclusion FP3C meeting – Bordeaux, 2-3 September 2011 -2
  • 3. 1 Background BlobSeer & HGMDS FP3C meeting – Bordeaux, 2-3 September 2011 -3
  • 4. BlobSeer: A large-scale data management service Generic data-management platform for huge, unstructured data •  Huge data (TB) : BLOBs •  Highly concurrent, fine-grain access (MB): R/W/A •  Prototype available Key design features •  Decentralized metadata management •  Beyond MVCC: multiversioning exposed to the user •  Lock-free write access through versioning A back-end for higher-level, sophisticated data management systems FP3C meeting – Bordeaux, 2-3 September 2011 -4
  • 5. BlobSeer: Architecture Clients Providers •  Perform fine grain blob accesses Providers •  Store the pages of the blob Provider manager •  Monitors the providers •  Favours data load balancing Provider Clients manager Metadata providers •  Store information about page location Version Version manager manager •  Ensures concurrency control Metadata providers FP3C meeting – Bordeaux, 2-3 September 2011 -5
  • 6. HGMDS: A distributed metadata management system for global file systems •  Multi-master file system The  Internet metadata server (MDS). Site A Site B •  Managing inode structure. File system Clients •  High latency networks don't affect metadata operation HGMD HGMD performance. S S - Both reading and writing. •  One MDS per site. •  Metadata versioning using mkdir/rmdir/ Propagate create/stat/ updates in vector clocks for collision unlink background detection. Site C •  Automatic collision resolution by system side. FP3C meeting – Bordeaux, 2-3 September 2011 -6
  • 7. 2 Goal A joint architecture integrating BlobSeer and HGMDS FP3C meeting – Bordeaux, 2-3 September 2011 -7
  • 8. Goal BlobSeer HGMDS Data management Metadata management Typically on a single site Global scale, multiple sites Idea: build a global file system deployed on multiple site by integrating BlobSeer to HGMDS Potential benefits: •  HGMDS: efficient multi-site file metadata management •  BlobSeer: concurrency-optimized access to globally shared data FP3C meeting – Bordeaux, 2-3 September 2011 -8
  • 9. 3 Our approach and solution FP3C meeting – Bordeaux, 2-3 September 2011 -9
  • 10. Two approaches Multiple BlobSeer instances •  One BlobSeer / site One single BlobSeer-WAN over distributed geographic sites FP3C meeting – Bordeaux, 2-3 September 2011 - 10
  • 11. 1st approach: 1 BlobSeer instance / site Client FP3C meeting – Bordeaux, 2-3 September 2011 - 11
  • 12. 1st approach: Zoom High latency when accessing remote BLOBs: •  Too many remote requests for small metadata EMETTEUR - NOM DE LA PRESENTATION - 12
  • 13. 2nd approach: 1 BlobSeer-WAN instance over distributed geographic sites Multiple version managers •  1 version manager/site Multiple provider managers •  1 provider manager/site On each site •  Multiple data providers and metadata servers •  Data providers are under control of local provider manager EMETTEUR - NOM DE LA PRESENTATION - 13
  • 14. Idea: leverage locality for remote metadata accesses 2 Metadata I/O is resolved locally EMETTEUR - NOM DE LA PRESENTATION - 14
  • 15. 2nd approach: I/O scheme in BlobSeer-WAN Writing •  Publish version on local version manager •  Locally write metadata on local metadata servers •  Locally write data on local data providers Reading (Read your write in many cases) •  Ask a version to local version manager •  Local metadata accesses •  Access remote/local providers if necessary FP3C meeting – Bordeaux, 2-3 September 2011 - 15
  • 16. Vector clocks and optimistic metadata replication FP3C meeting – Bordeaux, 2-3 September 2011 - 16
  • 17. Expected benefits •  On WAN: BlobSeer coordinates with HGMDS to provide a global versioning file system - Low latency metadata I/O - Eventually consistency model - Load balancing/fault tolerance •  On LAN: - Distributed version management - Load balancing/fault tolerance FP3C meeting – Bordeaux, 2-3 September 2011 - 17
  • 18. 4 Preliminary evaluation BlobSeer-WAN on G5K FP3C meeting – Bordeaux, 2-3 September 2011 - 18
  • 19. Testbed Using 2 sites of G5K •  Rennes: 40 nodes • 30 nodes reserved for BlobSeer services • 10 nodes for clients •  Grenoble: 40 nodes • 30 nodes reserved for BlobSeer services • 10 nodes for clients •  Interconnect network between sites 10 Gbps FP3C meeting – Bordeaux, 2-3 September 2011 - 19
  • 20. Concurrent appending: 512 MB/client FP3C meeting – Bordeaux, 2-3 September 2011 - 20
  • 21. 5 Conclusion On going work FP3C meeting – Bordeaux, 2-3 September 2011 - 21
  • 22. Summary Discussed the integration of BlobSeer and HGMDS: •  BlobSeer-WAN extension is required BlobSeer-WAN •  Preliminary results look encouraging •  Performance of BlobSeer-WAN on two sites similar to that of vanilla BlobSeer on a single site •  Prototype available at BlobSeer’s repository/branches/ BlobSeer-WAN-dev/ HGMDS •  Implementation almost done •  Works on multi-sites •  Collisions automatically solved by a rule FP3C meeting – Bordeaux, 2-3 September 2011 - 22
  • 23. Next steps •  A more extensive evaluation for BlobSeer-WAN •  Integrate BlobSeer-WAN to HGMDS •  Preliminary evaluation of HGMDS BlobSeer-WAN on Grid5000 and on the Japanese Clusters •  Submit co-authored paper by Spring 2012 •  Next internships: Kohei @Inria Rennes FP3C meeting – Bordeaux, 2-3 September 2011 - 23
  • 24. Thank you! FP3C meeting 2 – 3 September 2011