SlideShare a Scribd company logo
1 of 48
D2MCE
Speaker :呂宗螢
Adviser: 梁文耀 老師
Date : 2008/07/14
Embedded and Parallel Systems Lab 2
D2MCE
Wireless
Network
Embedded and Parallel Systems Lab 3
DSM
Three State
Invalid Shared
Exclusive
writemiss
shares={node}
invalidate invalidate
read miss
sharers = shares + {node}
fetch
w
rite
hit
sharers
=
{node}
read hit / write hit
read hit
Black = all node process
Red = only home node
process
Embedded and Parallel Systems Lab 5
Invalidate & update
Node 1 Node2 Node 3 Node 4
store(A)
update
update
update
load(A)
Node 1 Node2 Node 3 Node 4
store(A)
invalidate
load(A)
invalidateinvalidate
update
Invalidate Update
Embedded and Parallel Systems Lab 6
Release Consistency Definition
1. Before an ordinary access is allowed to
perform with respect to any other processor,
all previous acquires must be performed.
2. Before a release is allowed to perform with
respect to any other processor, all previous
ordinary read and writes must be performed.
3. Special accesses are sequentially consistent
with respect to one another.
Embedded and Parallel Systems Lab 7
ERC & LRC
Lazy RC
Node 1 Node 2 Node 3
store(A)
store(A
)
release
acquire
store(A
)
release
acquire
release
acquire
Eager RC
Node 1 Node 2 Node 3
store(A)
release
store(A
)
release
acquire
store(A
)
release
acquire
acquire
Embedded and Parallel Systems Lab 8
Home-base & Homeless
 Homeless
 Diff scattered in all the nodes
 Diff store
 Garbage collection
 Home-base
 Centralize processing & always update
 No diff store
 No garbage collection
 Home node access the share memory no communication
Embedded and Parallel Systems Lab 9
HLRC
Node 1 Node 2 Home Node 3
store(A)
acquire
release
Load(A)
acquire
release
Invalidate(A)
twin
diff
apply
diff
fetch page
Only send not invalid node
Invalid
Node 1 Home Node2
Not invalid
Node 3
Invalid
Node 4
store(A
)
acquire
release
invalidate
acquire
release
req
update
acquire
release
req
update
load(A)
load(A)
reply
HERC Worst Case
4*W count
8*W byte
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
A (exclusive)
A (invalid)
A (invalid) A (shared) A (invalid) A (invalid)
acquire
release
store(A)
A (invalid)
acquire
release
store(A)
A (exclusive)
A (exclusive)
A (invalid)
acquire
release
store(A)
A (invalid)
A (exclusive)
acquire
release
store(A)
A (exclusive)
A (invalid)
Invalidate
reply
Tradition ERC Worst Case
2(n-1) count
8*W byte
Node 1 Node 2 Node 3 Node 4
acquire
release
store(A)
A (invalid)
A (invalid) A (shared) A (invalid) A (invalid)
Invalidate
reply
acquire
release
store(A)
release
store(A)
acquire
release
acquire
store(A)
A (invalid)
A (invalid)
A (invalid)
A (exclusive)
A (exclusive)
A (exclusive)
A (exclusive)
HLRC Worst Case
1 count
3*4*n+8*sm byte
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
A (invalid) A (shared) A (invalid) A (invalid)
acquire
release
store(A)
acquire
release
store(A)
acquire
release
store(A)
acquire
reply
A(invalid)
Invalidate(A)
Invalidate(A)
Invalidate(A)
HERC Best Case
Node 1 Home Node 2
acquire
release
store(A)
A (exclusive) A (invalid)
A (invalid) A (invalid)
4 count
8*W byte
Invalidate
reply
acquire
release
store(A)
acquire
release
store(A)
Node 3
A (exclusive)
Tradition ERC Best Case
2(n-1) count
8*W byte
Node 1 Node 2 Node 3 Node 4
acquire
release
store(A)
A (exclusive) A (invalid)
A (invalid) A (invalid) A (exclusive) A (invalid)
Invalidate
reply
acquire
release
store(A)
acquire
release
store(A)
HLRC Best Case
Node 0 Home Node1
A (invalid) A (exclusive)
release
store(A)
1 count
3*4*n+8*sm W byte
acquire
reply
acquire
release
store(A)
acquire
release
store(A)
acquire
Invalidate(A)
Embedded and Parallel Systems Lab 17
Application
D2CME Libraries
Join / Leave Share Memory Barrier Mutex Semaphore
Thread Manager
Communication
Sender Receiver
Resource
Manager
Share
Memory
Manager
Barrier
Manager
Mutex
Manager
TCP/IP
Based
Socket
Semaphore
Manager
…
D2MCE ArchitectureD2MCE
Processing framework
Node
Process
Communication
Receiver
Thread pool
Thread pool
request
request
Queue
Queue
Queue
Thread pool
assignment
Embedded and Parallel Systems Lab 19
Node
CommunicationProcess
Computing
Thread
(Application)
Resource
Share Memory
Barrier
Mutex
Semphore
Receiver
Sender
Node
Node
Node
……
Request
Reply
Communication
Thread pool process request
Node
CommunicationProcess
Share Memory thread 1
busying
Receiver
Sender
Share Memory thread 2
sleeping
Share Memory thread 3
busying
Share Memory thread 4
sleeping
request
request
Queue
request
request
request
Embedded and Parallel Systems Lab 21
Low
Memory Pool
HighMemory Address
64 1024 10240 Other Free
64
1024
10240
other
Embedded and Parallel Systems Lab 22
Memory Pool
struct memory_info{
size_t size;
};
表格 1 memory information structure
圖表 5 memory pool memory block
 mem_malloc
 mem_free
Embedded and Parallel Systems Lab 23
Thread safe
 All function thread safe
struct request_header{
unsigned short msg_type; // message type
unsigned int size; // package size
unsigned int src_node; // source node id
unsigned int src_index; // source index number
unsigned int des_index; // destination index number
};
Embedded and Parallel Systems Lab 24
CPU
Job
Core1 Core1
CPU
Core1 Core1
Two Level Parallel
Parallel
on
Cluster
Parallel
on
Multi-Core
or CPU
Multi-thread call d2mce function
Node 1
load(A)
thread2
Home node2
thread1
load(A)
store(A)
block
A(invalid)
A(shared
)
A’s state is shared
don’t send request
barrier
A(exclusive)
Embedded and Parallel Systems Lab 26
Node1 Access
Node2 Access
Node2
False Sharing
Node1
Page
Embedded and Parallel Systems Lab 27
Multiple-Writer Protocols
Embedded and Parallel Systems Lab 28
Embedded and Parallel Systems Lab 29
multiple-writer protocol
int d2mce_mload(void *share_memory, unsigned int offset,
unsigned int length);
int d2mce_mstore(void *share_memory, unsigned int offset,
unsigned int length);
表格 3 Multiple-write protocol function
圖表 8 Multiple-writer protocol
Embedded and Parallel Systems Lab 30
multiple-writer protocol
If(node_id == 0)
d2mce_store(SM); // SM = share memory
d2mce_barrier(&barrier, nodes); // nodes = number of nodes
d2mce_mload(SM, start*sizeof(TYPE), end*sizeof(TYPE));
表格 4 Scatter program
pattern
d2mce_mstore(SM, start*sizeof(TYPE), end*sizeof(TYPE));
d2mce_barrier(&barrier, nodes);
if(node_id ==0)
d2mce_load(SM)
表格 5 Gather program pattern
Embedded and Parallel Systems Lab 31
Dynamic manager migration
int d2mce_sethome(void *share_memory);
int d2mce_ibarrier_manager();
int d2mce_isem_manager();
int d2mce_imutex_manager();
int d2mce_iresource_manager();
manager migration
New manager
Node 0
Old manager
Node1 Node 2 Node 3
manage
information
I home
request
Init & set
manage
information
ok
new manager
lock & wait
service
forward
unlock &
forward
request
request
new manager
block
HRC broadcast
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
acquire
release
load(A)
acquire
release
load(A)
release
load(A)
acquire
latency
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
acquire
load(A)
acquire
load(A)
load(A)
acquire
HRC broadcast barrier
barrierlatency
Node 2 Home Node 3Node 1 Node 4 Node 5 Node 6
store(A)
update
update
update
register node
Home based Disseminate Update
load(A) load(A) load(A) load(A)
not
invalid
invalidate
invalid
Broadcast coding pattern
store node all need load node
Use mutex
d2mce_mutex_lock(&m1)
d2mce_store(A)
d2mce_mutex_unlock(&m1)
d2mce_mutex_lock(&m1)
d2mce_load(A)
d2mce_mutex_unlock(&m1)
Use barrier
d2mce_store(A)
d2mce_barrier(&b1, neednodes)
d2mce_barrier(&b1, neednodes)
d2mce_load(A)
Use semaphore
d2mce_store(A)
for(i=0; i<neednodes; i++)
d2mce_sem_post(&m1)
d2mce_sem_wait(&m1)
d2mce_load(A)
Home based Disseminate Update
int d2mce_update_register(void* share_memory);
int d2mce_update_unregister(void* share_memory);
Embedded and Parallel Systems Lab 38
Home based Disseminate Register
Node 1 Home Node 2
Register update
1
Input the table
Node 1 Home Node 2
Unregister update
Clear the node
Event driven
int d2mce_checkUpdate(void* share_memory);
Embedded and Parallel Systems Lab 40
Event driven (update)
Node 1
Node 2
store(A)
update
load(A) load(A)
ShareMemory
thread
Computing
thread
update A
checkupdate(A)
signal
Embedded and Parallel Systems Lab 41
Event driven (invalid)
Node 1
Node 2
store(A)
invalid
load(A)
load(A)
Share
Memory
thread
Computing
thread
invalid A
checkupdate(A)
signal
update
request
write and immediately load coding pattern
Store node Load node
Use mutex
d2mce_mutex_lock(&m1)
d2mce_store(A)
d2mce_mutex_unlock(&m1)
while(1){
d2mce_mutex_lock(&m1)
d2mce_load(A)
d2mce_mutex_unlock(&m1)
}
Use barrier
d2mce_store(A)
d2mce_barrier(&b1, neednodes)
while(1){
d2mce_barrier(&b1, neednodes)
d2mce_load(A)
}
Use semaphore
d2mce_store(A)
for(i=0; i<neednodes; i++)
d2mce_sem_post(&m1, neednodes)
while(1){
d2mce_sem_wait(&m1)
d2mce_load(A)
}
Use even driven
d2mce_store(A) while(1){
d2mce_checkUpdate(A)
d2mce_load(A)
}
Evaluation
MM
  1 2 4
128*128 0.0224598
0.0150916
[1.488231864]
0.0149468
[1.502649397]
256*256 0.1624132
0.09476025
[1.71393807]
0.07156825
[2.269347092]
512*512 1.3165244
0.6979126
[1.886374311]
0.438122
[3.004926482]
1024*10
24 38.787176 20.96464 [1.850123637]
10.51557
[3.688547173]
2048*20
48
362.681963
4
184.635501
[1.964313263]
91.1462238
[3.979122209]
Embedded and Parallel Systems Lab 44
Reference
1. Lamport, L. “How to make a correct multiprocess program execute
correctly on amultiprocessor.”, IEEE Transactions on Computers, On
page(s): 779-782, Jul 1997
2. K.Gharachorlook, D.Lenoski, J. Laudon, P.Gibbons, A.Gupta, and
J.Hennessy. ”Memory Consistency and Event Ordering in Scalable
Shared-Memory Multiprocessors.”, In Proceedings of the 17th Annual
Symposium on Computer Architecture, Pages 15-26, May 1990
3. L. Iftode, J.P. Singh and K. Li. “Scope Consistency: A Bridge between
Release Consistency and Entry Consistency.“, In Proc. of the 8th
Annual ACM Symposium on Parallel Algorithms and Architectures,
1996.
4. J.B. Carter, J.K. Bennett, and W. Zwaenepoel.”Implementation and
performance of Munin.” In Pro-ceedings of the 13th ACM Symposium
on Operating Systems Principles, pages 152-164, October 1991.
Embedded and Parallel Systems Lab 45
Reference
4. Keleher, P. Cox, A.L. Zwaenepoel, W. ”Lazy Release Consistency for
Software Distributed Shared Memory.” , In Computer Architecture, 1992.
Proceedings., The 19th Annual International Symposium, Pages 13-21, May
1992.
5. Y. Zhou, L. Iftode, and K. Li. ”Performance evaluation of two home-based
lazy release consistency protocols for shared virtual memory systems.”, In
Proceedings of the Second USENIX Symposium on Operating System Design
and Implementation, pages 75-88, November 1996.
6. Cox, A.L.; de Lara, E.; Hu, C.; Zwaenepoel, W. ”A performance comparison
of homeless and home-based lazy releaseconsistency protocols in software
shared memory.” , In High-Performance Computer Architecture, 1999.
Proceedings. Fifth International Symposium, page(s): 279-283, Jan 1999.
7. Byung-Hyun Yu, Zhiyi Huang, Stephen Cranefield, Martin Purvis. ”Homeless
and Home-based Lazy Release Consistency Protocols on Distributed
Shared.”, ACM International Conference Proceeding Series; Vol. 56
Proceedings of the 27th Australasian conference on Computer science - Volume
26, Pages:117-123, 2004 .
Embedded and Parallel Systems Lab 46
Reference
9. Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel,
“TreadMarks: distributed shared memory on standard
workstations and operating systems”, In Proceedings of the winter
USENIX Conference, pages:115-132, January 1994.
10. Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher,
Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, Willy
Zwaenepoel,”TreadMarks: shared memory computing on networks
of workstations.” , IEEE Computer 29(2), 18-28, 1996.
11. B. Cheung, C. Wang, and K. Hwang. ”A Migrating-Home Protocol
for Implementing Scope Consistency Model on a Cluster of
Workstations.” In International Conference on Parallel and Distributed
Processing Techniques and Applications, pages 821–827, 1999.
12. W. Hu, W. Shi, and Z. Tang. ”Home Migration in Home-based
Software DSMs.” In Proc. of the 1st Workshop on Software Distributed
Shared Memory (WSDSM’99), 1999.
Embedded and Parallel Systems Lab 47
Reference
13. W. Fang, C.-L. Wang, W. Zhu, and F. C. Lau. “A novel adaptive home
migration protocol in home-based DSM.” In Proc.of the 2004 IEEE
International Conference on Cluster Computing (Cluster2004), pages 215-224,
2004.
14. Sandhya Dwarkadas, Peter Keleher, Alan L. Cox, Willy Zwaenepoel,
“Evaluation of release consistent software distributed shared memory on
emerging network technology.” ACM SIGARCH Computer Architecture
News Volume 21 , Issue 2, Pages: 144 - 155 , May 1993
15. Weiwu Hu, Weisong Shi, Zhimin Tang, Zhiyu Zhou, “JIAJIA: An SVM
System Based on a New Cache Coherence Protocol (1998)”, Proc. of the
High-Performance Computing and Networking Europe 1999 (HPCN'99)
16. Wen-Yew Liang, Yu-Ming Hsieh and Zong-Ying Lyu, “Design of a Dynamic
Distributed Mobile Computing Environment,” in the Proceedings of the 13th
International Conference on Parallel and Distributed Systems (ICPADS 2007),
Dec. 5-7, 2007, Hsinchu, Taiwan, NSC: 96-2221-E-027-023. (EI)
Reference
17. Shun-Yun Hu, Guan-Ming Liao, “Scalable peer-to-peer networked
virtual environment”, Network and System Support for Games
Proceedings of 3rd ACM SIGCOMM workshop on Network and system
support for games, Pages: 129 – 133, Year of Publication: 2004
18. Matt Welsh, Steven D. Gribble, Eric A. Brewer, David Culler,”A
Design Framework for Highly Concurrent System”, EECS
Department University of California, Berkeley Technical Report No.
UCB/CSD-00-1108 2000.

More Related Content

Viewers also liked (12)

Acc
Acc Acc
Acc
 
Creative & Digital Business Briefing - October 2016
Creative & Digital Business Briefing - October 2016Creative & Digital Business Briefing - October 2016
Creative & Digital Business Briefing - October 2016
 
Everyone needs life insurance
Everyone needs life insuranceEveryone needs life insurance
Everyone needs life insurance
 
tik icha smpit rpi
tik icha smpit rpi tik icha smpit rpi
tik icha smpit rpi
 
Cs437 lecture 13
Cs437 lecture 13Cs437 lecture 13
Cs437 lecture 13
 
Forever Living Products… where ordinary people achieve extraordinary results
Forever Living Products… where ordinary people achieve extraordinary resultsForever Living Products… where ordinary people achieve extraordinary results
Forever Living Products… where ordinary people achieve extraordinary results
 
Programme on Ms Excel For Managerial Computing
Programme on Ms Excel For Managerial ComputingProgramme on Ms Excel For Managerial Computing
Programme on Ms Excel For Managerial Computing
 
How to do Spirometry in the Workplace
How to do Spirometry in the WorkplaceHow to do Spirometry in the Workplace
How to do Spirometry in the Workplace
 
Obesity
ObesityObesity
Obesity
 
Appul
AppulAppul
Appul
 
x town report
x town reportx town report
x town report
 
Epc slides part 2
Epc slides part 2Epc slides part 2
Epc slides part 2
 

More from ZongYing Lyu

Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemZongYing Lyu
 
A deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processorA deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processorZongYing Lyu
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixZongYing Lyu
 
Device Driver - Chapter 6字元驅動程式的進階作業
Device Driver - Chapter 6字元驅動程式的進階作業Device Driver - Chapter 6字元驅動程式的進階作業
Device Driver - Chapter 6字元驅動程式的進階作業ZongYing Lyu
 
Device Driver - Chapter 3字元驅動程式
Device Driver - Chapter 3字元驅動程式Device Driver - Chapter 3字元驅動程式
Device Driver - Chapter 3字元驅動程式ZongYing Lyu
 
Web coding principle
Web coding principleWeb coding principle
Web coding principleZongYing Lyu
 
提高 Code 品質心得
提高 Code 品質心得提高 Code 品質心得
提高 Code 品質心得ZongYing Lyu
 
Consistency protocols
Consistency protocolsConsistency protocols
Consistency protocolsZongYing Lyu
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimizationZongYing Lyu
 
MPI use c language
MPI use c languageMPI use c language
MPI use c languageZongYing Lyu
 
Parallel program design
Parallel program designParallel program design
Parallel program designZongYing Lyu
 

More from ZongYing Lyu (16)

Vue.js
Vue.jsVue.js
Vue.js
 
Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory system
 
A deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processorA deep dive into energy efficient multi core processor
A deep dive into energy efficient multi core processor
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
Device Driver - Chapter 6字元驅動程式的進階作業
Device Driver - Chapter 6字元驅動程式的進階作業Device Driver - Chapter 6字元驅動程式的進階作業
Device Driver - Chapter 6字元驅動程式的進階作業
 
Device Driver - Chapter 3字元驅動程式
Device Driver - Chapter 3字元驅動程式Device Driver - Chapter 3字元驅動程式
Device Driver - Chapter 3字元驅動程式
 
Web coding principle
Web coding principleWeb coding principle
Web coding principle
 
提高 Code 品質心得
提高 Code 品質心得提高 Code 品質心得
提高 Code 品質心得
 
SCRUM
SCRUMSCRUM
SCRUM
 
Consistency protocols
Consistency protocolsConsistency protocols
Consistency protocols
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
 
MPI use c language
MPI use c languageMPI use c language
MPI use c language
 
Cvs
CvsCvs
Cvs
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
MPI
MPIMPI
MPI
 
OpenMP
OpenMPOpenMP
OpenMP
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

D2MCE

  • 2. Embedded and Parallel Systems Lab 2 D2MCE Wireless Network
  • 3. Embedded and Parallel Systems Lab 3 DSM
  • 4. Three State Invalid Shared Exclusive writemiss shares={node} invalidate invalidate read miss sharers = shares + {node} fetch w rite hit sharers = {node} read hit / write hit read hit Black = all node process Red = only home node process
  • 5. Embedded and Parallel Systems Lab 5 Invalidate & update Node 1 Node2 Node 3 Node 4 store(A) update update update load(A) Node 1 Node2 Node 3 Node 4 store(A) invalidate load(A) invalidateinvalidate update Invalidate Update
  • 6. Embedded and Parallel Systems Lab 6 Release Consistency Definition 1. Before an ordinary access is allowed to perform with respect to any other processor, all previous acquires must be performed. 2. Before a release is allowed to perform with respect to any other processor, all previous ordinary read and writes must be performed. 3. Special accesses are sequentially consistent with respect to one another.
  • 7. Embedded and Parallel Systems Lab 7 ERC & LRC Lazy RC Node 1 Node 2 Node 3 store(A) store(A ) release acquire store(A ) release acquire release acquire Eager RC Node 1 Node 2 Node 3 store(A) release store(A ) release acquire store(A ) release acquire acquire
  • 8. Embedded and Parallel Systems Lab 8 Home-base & Homeless  Homeless  Diff scattered in all the nodes  Diff store  Garbage collection  Home-base  Centralize processing & always update  No diff store  No garbage collection  Home node access the share memory no communication
  • 9. Embedded and Parallel Systems Lab 9 HLRC Node 1 Node 2 Home Node 3 store(A) acquire release Load(A) acquire release Invalidate(A) twin diff apply diff fetch page
  • 10. Only send not invalid node Invalid Node 1 Home Node2 Not invalid Node 3 Invalid Node 4 store(A ) acquire release invalidate acquire release req update acquire release req update load(A) load(A) reply
  • 11. HERC Worst Case 4*W count 8*W byte Node 1 Home Node 2 Node 3 Node 4 acquire release store(A) A (exclusive) A (invalid) A (invalid) A (shared) A (invalid) A (invalid) acquire release store(A) A (invalid) acquire release store(A) A (exclusive) A (exclusive) A (invalid) acquire release store(A) A (invalid) A (exclusive) acquire release store(A) A (exclusive) A (invalid) Invalidate reply
  • 12. Tradition ERC Worst Case 2(n-1) count 8*W byte Node 1 Node 2 Node 3 Node 4 acquire release store(A) A (invalid) A (invalid) A (shared) A (invalid) A (invalid) Invalidate reply acquire release store(A) release store(A) acquire release acquire store(A) A (invalid) A (invalid) A (invalid) A (exclusive) A (exclusive) A (exclusive) A (exclusive)
  • 13. HLRC Worst Case 1 count 3*4*n+8*sm byte Node 1 Home Node 2 Node 3 Node 4 acquire release store(A) A (invalid) A (shared) A (invalid) A (invalid) acquire release store(A) acquire release store(A) acquire release store(A) acquire reply A(invalid) Invalidate(A) Invalidate(A) Invalidate(A)
  • 14. HERC Best Case Node 1 Home Node 2 acquire release store(A) A (exclusive) A (invalid) A (invalid) A (invalid) 4 count 8*W byte Invalidate reply acquire release store(A) acquire release store(A) Node 3 A (exclusive)
  • 15. Tradition ERC Best Case 2(n-1) count 8*W byte Node 1 Node 2 Node 3 Node 4 acquire release store(A) A (exclusive) A (invalid) A (invalid) A (invalid) A (exclusive) A (invalid) Invalidate reply acquire release store(A) acquire release store(A)
  • 16. HLRC Best Case Node 0 Home Node1 A (invalid) A (exclusive) release store(A) 1 count 3*4*n+8*sm W byte acquire reply acquire release store(A) acquire release store(A) acquire Invalidate(A)
  • 17. Embedded and Parallel Systems Lab 17 Application D2CME Libraries Join / Leave Share Memory Barrier Mutex Semaphore Thread Manager Communication Sender Receiver Resource Manager Share Memory Manager Barrier Manager Mutex Manager TCP/IP Based Socket Semaphore Manager … D2MCE ArchitectureD2MCE
  • 18. Processing framework Node Process Communication Receiver Thread pool Thread pool request request Queue Queue Queue Thread pool assignment
  • 19. Embedded and Parallel Systems Lab 19 Node CommunicationProcess Computing Thread (Application) Resource Share Memory Barrier Mutex Semphore Receiver Sender Node Node Node …… Request Reply Communication
  • 20. Thread pool process request Node CommunicationProcess Share Memory thread 1 busying Receiver Sender Share Memory thread 2 sleeping Share Memory thread 3 busying Share Memory thread 4 sleeping request request Queue request request request
  • 21. Embedded and Parallel Systems Lab 21 Low Memory Pool HighMemory Address 64 1024 10240 Other Free 64 1024 10240 other
  • 22. Embedded and Parallel Systems Lab 22 Memory Pool struct memory_info{ size_t size; }; 表格 1 memory information structure 圖表 5 memory pool memory block  mem_malloc  mem_free
  • 23. Embedded and Parallel Systems Lab 23 Thread safe  All function thread safe struct request_header{ unsigned short msg_type; // message type unsigned int size; // package size unsigned int src_node; // source node id unsigned int src_index; // source index number unsigned int des_index; // destination index number };
  • 24. Embedded and Parallel Systems Lab 24 CPU Job Core1 Core1 CPU Core1 Core1 Two Level Parallel Parallel on Cluster Parallel on Multi-Core or CPU
  • 25. Multi-thread call d2mce function Node 1 load(A) thread2 Home node2 thread1 load(A) store(A) block A(invalid) A(shared ) A’s state is shared don’t send request barrier A(exclusive)
  • 26. Embedded and Parallel Systems Lab 26 Node1 Access Node2 Access Node2 False Sharing Node1 Page
  • 27. Embedded and Parallel Systems Lab 27 Multiple-Writer Protocols
  • 28. Embedded and Parallel Systems Lab 28
  • 29. Embedded and Parallel Systems Lab 29 multiple-writer protocol int d2mce_mload(void *share_memory, unsigned int offset, unsigned int length); int d2mce_mstore(void *share_memory, unsigned int offset, unsigned int length); 表格 3 Multiple-write protocol function 圖表 8 Multiple-writer protocol
  • 30. Embedded and Parallel Systems Lab 30 multiple-writer protocol If(node_id == 0) d2mce_store(SM); // SM = share memory d2mce_barrier(&barrier, nodes); // nodes = number of nodes d2mce_mload(SM, start*sizeof(TYPE), end*sizeof(TYPE)); 表格 4 Scatter program pattern d2mce_mstore(SM, start*sizeof(TYPE), end*sizeof(TYPE)); d2mce_barrier(&barrier, nodes); if(node_id ==0) d2mce_load(SM) 表格 5 Gather program pattern
  • 31. Embedded and Parallel Systems Lab 31 Dynamic manager migration int d2mce_sethome(void *share_memory); int d2mce_ibarrier_manager(); int d2mce_isem_manager(); int d2mce_imutex_manager(); int d2mce_iresource_manager();
  • 32. manager migration New manager Node 0 Old manager Node1 Node 2 Node 3 manage information I home request Init & set manage information ok new manager lock & wait service forward unlock & forward request request new manager block
  • 33. HRC broadcast Node 1 Home Node 2 Node 3 Node 4 acquire release store(A) acquire release load(A) acquire release load(A) release load(A) acquire latency
  • 34. Node 1 Home Node 2 Node 3 Node 4 acquire release store(A) acquire load(A) acquire load(A) load(A) acquire HRC broadcast barrier barrierlatency
  • 35. Node 2 Home Node 3Node 1 Node 4 Node 5 Node 6 store(A) update update update register node Home based Disseminate Update load(A) load(A) load(A) load(A) not invalid invalidate invalid
  • 36. Broadcast coding pattern store node all need load node Use mutex d2mce_mutex_lock(&m1) d2mce_store(A) d2mce_mutex_unlock(&m1) d2mce_mutex_lock(&m1) d2mce_load(A) d2mce_mutex_unlock(&m1) Use barrier d2mce_store(A) d2mce_barrier(&b1, neednodes) d2mce_barrier(&b1, neednodes) d2mce_load(A) Use semaphore d2mce_store(A) for(i=0; i<neednodes; i++) d2mce_sem_post(&m1) d2mce_sem_wait(&m1) d2mce_load(A)
  • 37. Home based Disseminate Update int d2mce_update_register(void* share_memory); int d2mce_update_unregister(void* share_memory);
  • 38. Embedded and Parallel Systems Lab 38 Home based Disseminate Register Node 1 Home Node 2 Register update 1 Input the table Node 1 Home Node 2 Unregister update Clear the node
  • 40. Embedded and Parallel Systems Lab 40 Event driven (update) Node 1 Node 2 store(A) update load(A) load(A) ShareMemory thread Computing thread update A checkupdate(A) signal
  • 41. Embedded and Parallel Systems Lab 41 Event driven (invalid) Node 1 Node 2 store(A) invalid load(A) load(A) Share Memory thread Computing thread invalid A checkupdate(A) signal update request
  • 42. write and immediately load coding pattern Store node Load node Use mutex d2mce_mutex_lock(&m1) d2mce_store(A) d2mce_mutex_unlock(&m1) while(1){ d2mce_mutex_lock(&m1) d2mce_load(A) d2mce_mutex_unlock(&m1) } Use barrier d2mce_store(A) d2mce_barrier(&b1, neednodes) while(1){ d2mce_barrier(&b1, neednodes) d2mce_load(A) } Use semaphore d2mce_store(A) for(i=0; i<neednodes; i++) d2mce_sem_post(&m1, neednodes) while(1){ d2mce_sem_wait(&m1) d2mce_load(A) } Use even driven d2mce_store(A) while(1){ d2mce_checkUpdate(A) d2mce_load(A) }
  • 43. Evaluation MM   1 2 4 128*128 0.0224598 0.0150916 [1.488231864] 0.0149468 [1.502649397] 256*256 0.1624132 0.09476025 [1.71393807] 0.07156825 [2.269347092] 512*512 1.3165244 0.6979126 [1.886374311] 0.438122 [3.004926482] 1024*10 24 38.787176 20.96464 [1.850123637] 10.51557 [3.688547173] 2048*20 48 362.681963 4 184.635501 [1.964313263] 91.1462238 [3.979122209]
  • 44. Embedded and Parallel Systems Lab 44 Reference 1. Lamport, L. “How to make a correct multiprocess program execute correctly on amultiprocessor.”, IEEE Transactions on Computers, On page(s): 779-782, Jul 1997 2. K.Gharachorlook, D.Lenoski, J. Laudon, P.Gibbons, A.Gupta, and J.Hennessy. ”Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors.”, In Proceedings of the 17th Annual Symposium on Computer Architecture, Pages 15-26, May 1990 3. L. Iftode, J.P. Singh and K. Li. “Scope Consistency: A Bridge between Release Consistency and Entry Consistency.“, In Proc. of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, 1996. 4. J.B. Carter, J.K. Bennett, and W. Zwaenepoel.”Implementation and performance of Munin.” In Pro-ceedings of the 13th ACM Symposium on Operating Systems Principles, pages 152-164, October 1991.
  • 45. Embedded and Parallel Systems Lab 45 Reference 4. Keleher, P. Cox, A.L. Zwaenepoel, W. ”Lazy Release Consistency for Software Distributed Shared Memory.” , In Computer Architecture, 1992. Proceedings., The 19th Annual International Symposium, Pages 13-21, May 1992. 5. Y. Zhou, L. Iftode, and K. Li. ”Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems.”, In Proceedings of the Second USENIX Symposium on Operating System Design and Implementation, pages 75-88, November 1996. 6. Cox, A.L.; de Lara, E.; Hu, C.; Zwaenepoel, W. ”A performance comparison of homeless and home-based lazy releaseconsistency protocols in software shared memory.” , In High-Performance Computer Architecture, 1999. Proceedings. Fifth International Symposium, page(s): 279-283, Jan 1999. 7. Byung-Hyun Yu, Zhiyi Huang, Stephen Cranefield, Martin Purvis. ”Homeless and Home-based Lazy Release Consistency Protocols on Distributed Shared.”, ACM International Conference Proceeding Series; Vol. 56 Proceedings of the 27th Australasian conference on Computer science - Volume 26, Pages:117-123, 2004 .
  • 46. Embedded and Parallel Systems Lab 46 Reference 9. Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel, “TreadMarks: distributed shared memory on standard workstations and operating systems”, In Proceedings of the winter USENIX Conference, pages:115-132, January 1994. 10. Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher, Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, Willy Zwaenepoel,”TreadMarks: shared memory computing on networks of workstations.” , IEEE Computer 29(2), 18-28, 1996. 11. B. Cheung, C. Wang, and K. Hwang. ”A Migrating-Home Protocol for Implementing Scope Consistency Model on a Cluster of Workstations.” In International Conference on Parallel and Distributed Processing Techniques and Applications, pages 821–827, 1999. 12. W. Hu, W. Shi, and Z. Tang. ”Home Migration in Home-based Software DSMs.” In Proc. of the 1st Workshop on Software Distributed Shared Memory (WSDSM’99), 1999.
  • 47. Embedded and Parallel Systems Lab 47 Reference 13. W. Fang, C.-L. Wang, W. Zhu, and F. C. Lau. “A novel adaptive home migration protocol in home-based DSM.” In Proc.of the 2004 IEEE International Conference on Cluster Computing (Cluster2004), pages 215-224, 2004. 14. Sandhya Dwarkadas, Peter Keleher, Alan L. Cox, Willy Zwaenepoel, “Evaluation of release consistent software distributed shared memory on emerging network technology.” ACM SIGARCH Computer Architecture News Volume 21 , Issue 2, Pages: 144 - 155 , May 1993 15. Weiwu Hu, Weisong Shi, Zhimin Tang, Zhiyu Zhou, “JIAJIA: An SVM System Based on a New Cache Coherence Protocol (1998)”, Proc. of the High-Performance Computing and Networking Europe 1999 (HPCN'99) 16. Wen-Yew Liang, Yu-Ming Hsieh and Zong-Ying Lyu, “Design of a Dynamic Distributed Mobile Computing Environment,” in the Proceedings of the 13th International Conference on Parallel and Distributed Systems (ICPADS 2007), Dec. 5-7, 2007, Hsinchu, Taiwan, NSC: 96-2221-E-027-023. (EI)
  • 48. Reference 17. Shun-Yun Hu, Guan-Ming Liao, “Scalable peer-to-peer networked virtual environment”, Network and System Support for Games Proceedings of 3rd ACM SIGCOMM workshop on Network and system support for games, Pages: 129 – 133, Year of Publication: 2004 18. Matt Welsh, Steven D. Gribble, Eric A. Brewer, David Culler,”A Design Framework for Highly Concurrent System”, EECS Department University of California, Berkeley Technical Report No. UCB/CSD-00-1108 2000.