6. Embedded and Parallel Systems Lab 6
Release Consistency Definition
1. Before an ordinary access is allowed to
perform with respect to any other processor,
all previous acquires must be performed.
2. Before a release is allowed to perform with
respect to any other processor, all previous
ordinary read and writes must be performed.
3. Special accesses are sequentially consistent
with respect to one another.
8. Embedded and Parallel Systems Lab 8
Home-base & Homeless
Homeless
Diff scattered in all the nodes
Diff store
Garbage collection
Home-base
Centralize processing & always update
No diff store
No garbage collection
Home node access the share memory no communication
9. Embedded and Parallel Systems Lab 9
HLRC
Node 1 Node 2 Home Node 3
store(A)
acquire
release
Load(A)
acquire
release
Invalidate(A)
twin
diff
apply
diff
fetch page
10. Only send not invalid node
Invalid
Node 1 Home Node2
Not invalid
Node 3
Invalid
Node 4
store(A
)
acquire
release
invalidate
acquire
release
req
update
acquire
release
req
update
load(A)
load(A)
reply
11. HERC Worst Case
4*W count
8*W byte
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
A (exclusive)
A (invalid)
A (invalid) A (shared) A (invalid) A (invalid)
acquire
release
store(A)
A (invalid)
acquire
release
store(A)
A (exclusive)
A (exclusive)
A (invalid)
acquire
release
store(A)
A (invalid)
A (exclusive)
acquire
release
store(A)
A (exclusive)
A (invalid)
Invalidate
reply
12. Tradition ERC Worst Case
2(n-1) count
8*W byte
Node 1 Node 2 Node 3 Node 4
acquire
release
store(A)
A (invalid)
A (invalid) A (shared) A (invalid) A (invalid)
Invalidate
reply
acquire
release
store(A)
release
store(A)
acquire
release
acquire
store(A)
A (invalid)
A (invalid)
A (invalid)
A (exclusive)
A (exclusive)
A (exclusive)
A (exclusive)
13. HLRC Worst Case
1 count
3*4*n+8*sm byte
Node 1 Home Node 2 Node 3 Node 4
acquire
release
store(A)
A (invalid) A (shared) A (invalid) A (invalid)
acquire
release
store(A)
acquire
release
store(A)
acquire
release
store(A)
acquire
reply
A(invalid)
Invalidate(A)
Invalidate(A)
Invalidate(A)
14. HERC Best Case
Node 1 Home Node 2
acquire
release
store(A)
A (exclusive) A (invalid)
A (invalid) A (invalid)
4 count
8*W byte
Invalidate
reply
acquire
release
store(A)
acquire
release
store(A)
Node 3
A (exclusive)
15. Tradition ERC Best Case
2(n-1) count
8*W byte
Node 1 Node 2 Node 3 Node 4
acquire
release
store(A)
A (exclusive) A (invalid)
A (invalid) A (invalid) A (exclusive) A (invalid)
Invalidate
reply
acquire
release
store(A)
acquire
release
store(A)
16. HLRC Best Case
Node 0 Home Node1
A (invalid) A (exclusive)
release
store(A)
1 count
3*4*n+8*sm W byte
acquire
reply
acquire
release
store(A)
acquire
release
store(A)
acquire
Invalidate(A)
17. Embedded and Parallel Systems Lab 17
Application
D2CME Libraries
Join / Leave Share Memory Barrier Mutex Semaphore
Thread Manager
Communication
Sender Receiver
Resource
Manager
Share
Memory
Manager
Barrier
Manager
Mutex
Manager
TCP/IP
Based
Socket
Semaphore
Manager
…
D2MCE ArchitectureD2MCE
21. Embedded and Parallel Systems Lab 21
Low
Memory Pool
HighMemory Address
64 1024 10240 Other Free
64
1024
10240
other
22. Embedded and Parallel Systems Lab 22
Memory Pool
struct memory_info{
size_t size;
};
表格 1 memory information structure
圖表 5 memory pool memory block
mem_malloc
mem_free
23. Embedded and Parallel Systems Lab 23
Thread safe
All function thread safe
struct request_header{
unsigned short msg_type; // message type
unsigned int size; // package size
unsigned int src_node; // source node id
unsigned int src_index; // source index number
unsigned int des_index; // destination index number
};
24. Embedded and Parallel Systems Lab 24
CPU
Job
Core1 Core1
CPU
Core1 Core1
Two Level Parallel
Parallel
on
Cluster
Parallel
on
Multi-Core
or CPU
25. Multi-thread call d2mce function
Node 1
load(A)
thread2
Home node2
thread1
load(A)
store(A)
block
A(invalid)
A(shared
)
A’s state is shared
don’t send request
barrier
A(exclusive)
26. Embedded and Parallel Systems Lab 26
Node1 Access
Node2 Access
Node2
False Sharing
Node1
Page
29. Embedded and Parallel Systems Lab 29
multiple-writer protocol
int d2mce_mload(void *share_memory, unsigned int offset,
unsigned int length);
int d2mce_mstore(void *share_memory, unsigned int offset,
unsigned int length);
表格 3 Multiple-write protocol function
圖表 8 Multiple-writer protocol
30. Embedded and Parallel Systems Lab 30
multiple-writer protocol
If(node_id == 0)
d2mce_store(SM); // SM = share memory
d2mce_barrier(&barrier, nodes); // nodes = number of nodes
d2mce_mload(SM, start*sizeof(TYPE), end*sizeof(TYPE));
表格 4 Scatter program
pattern
d2mce_mstore(SM, start*sizeof(TYPE), end*sizeof(TYPE));
d2mce_barrier(&barrier, nodes);
if(node_id ==0)
d2mce_load(SM)
表格 5 Gather program pattern
31. Embedded and Parallel Systems Lab 31
Dynamic manager migration
int d2mce_sethome(void *share_memory);
int d2mce_ibarrier_manager();
int d2mce_isem_manager();
int d2mce_imutex_manager();
int d2mce_iresource_manager();
32. manager migration
New manager
Node 0
Old manager
Node1 Node 2 Node 3
manage
information
I home
request
Init & set
manage
information
ok
new manager
lock & wait
service
forward
unlock &
forward
request
request
new manager
block
35. Node 2 Home Node 3Node 1 Node 4 Node 5 Node 6
store(A)
update
update
update
register node
Home based Disseminate Update
load(A) load(A) load(A) load(A)
not
invalid
invalidate
invalid
36. Broadcast coding pattern
store node all need load node
Use mutex
d2mce_mutex_lock(&m1)
d2mce_store(A)
d2mce_mutex_unlock(&m1)
d2mce_mutex_lock(&m1)
d2mce_load(A)
d2mce_mutex_unlock(&m1)
Use barrier
d2mce_store(A)
d2mce_barrier(&b1, neednodes)
d2mce_barrier(&b1, neednodes)
d2mce_load(A)
Use semaphore
d2mce_store(A)
for(i=0; i<neednodes; i++)
d2mce_sem_post(&m1)
d2mce_sem_wait(&m1)
d2mce_load(A)
37. Home based Disseminate Update
int d2mce_update_register(void* share_memory);
int d2mce_update_unregister(void* share_memory);
38. Embedded and Parallel Systems Lab 38
Home based Disseminate Register
Node 1 Home Node 2
Register update
1
Input the table
Node 1 Home Node 2
Unregister update
Clear the node
44. Embedded and Parallel Systems Lab 44
Reference
1. Lamport, L. “How to make a correct multiprocess program execute
correctly on amultiprocessor.”, IEEE Transactions on Computers, On
page(s): 779-782, Jul 1997
2. K.Gharachorlook, D.Lenoski, J. Laudon, P.Gibbons, A.Gupta, and
J.Hennessy. ”Memory Consistency and Event Ordering in Scalable
Shared-Memory Multiprocessors.”, In Proceedings of the 17th Annual
Symposium on Computer Architecture, Pages 15-26, May 1990
3. L. Iftode, J.P. Singh and K. Li. “Scope Consistency: A Bridge between
Release Consistency and Entry Consistency.“, In Proc. of the 8th
Annual ACM Symposium on Parallel Algorithms and Architectures,
1996.
4. J.B. Carter, J.K. Bennett, and W. Zwaenepoel.”Implementation and
performance of Munin.” In Pro-ceedings of the 13th ACM Symposium
on Operating Systems Principles, pages 152-164, October 1991.
45. Embedded and Parallel Systems Lab 45
Reference
4. Keleher, P. Cox, A.L. Zwaenepoel, W. ”Lazy Release Consistency for
Software Distributed Shared Memory.” , In Computer Architecture, 1992.
Proceedings., The 19th Annual International Symposium, Pages 13-21, May
1992.
5. Y. Zhou, L. Iftode, and K. Li. ”Performance evaluation of two home-based
lazy release consistency protocols for shared virtual memory systems.”, In
Proceedings of the Second USENIX Symposium on Operating System Design
and Implementation, pages 75-88, November 1996.
6. Cox, A.L.; de Lara, E.; Hu, C.; Zwaenepoel, W. ”A performance comparison
of homeless and home-based lazy releaseconsistency protocols in software
shared memory.” , In High-Performance Computer Architecture, 1999.
Proceedings. Fifth International Symposium, page(s): 279-283, Jan 1999.
7. Byung-Hyun Yu, Zhiyi Huang, Stephen Cranefield, Martin Purvis. ”Homeless
and Home-based Lazy Release Consistency Protocols on Distributed
Shared.”, ACM International Conference Proceeding Series; Vol. 56
Proceedings of the 27th Australasian conference on Computer science - Volume
26, Pages:117-123, 2004 .
46. Embedded and Parallel Systems Lab 46
Reference
9. Pete Keleher, Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel,
“TreadMarks: distributed shared memory on standard
workstations and operating systems”, In Proceedings of the winter
USENIX Conference, pages:115-132, January 1994.
10. Cristiana Amza, Alan L. Cox, Sandhya Dwarkadas, Pete Keleher,
Honghui Lu, Ramakrishnan Rajamony, Weimin Yu, Willy
Zwaenepoel,”TreadMarks: shared memory computing on networks
of workstations.” , IEEE Computer 29(2), 18-28, 1996.
11. B. Cheung, C. Wang, and K. Hwang. ”A Migrating-Home Protocol
for Implementing Scope Consistency Model on a Cluster of
Workstations.” In International Conference on Parallel and Distributed
Processing Techniques and Applications, pages 821–827, 1999.
12. W. Hu, W. Shi, and Z. Tang. ”Home Migration in Home-based
Software DSMs.” In Proc. of the 1st Workshop on Software Distributed
Shared Memory (WSDSM’99), 1999.
47. Embedded and Parallel Systems Lab 47
Reference
13. W. Fang, C.-L. Wang, W. Zhu, and F. C. Lau. “A novel adaptive home
migration protocol in home-based DSM.” In Proc.of the 2004 IEEE
International Conference on Cluster Computing (Cluster2004), pages 215-224,
2004.
14. Sandhya Dwarkadas, Peter Keleher, Alan L. Cox, Willy Zwaenepoel,
“Evaluation of release consistent software distributed shared memory on
emerging network technology.” ACM SIGARCH Computer Architecture
News Volume 21 , Issue 2, Pages: 144 - 155 , May 1993
15. Weiwu Hu, Weisong Shi, Zhimin Tang, Zhiyu Zhou, “JIAJIA: An SVM
System Based on a New Cache Coherence Protocol (1998)”, Proc. of the
High-Performance Computing and Networking Europe 1999 (HPCN'99)
16. Wen-Yew Liang, Yu-Ming Hsieh and Zong-Ying Lyu, “Design of a Dynamic
Distributed Mobile Computing Environment,” in the Proceedings of the 13th
International Conference on Parallel and Distributed Systems (ICPADS 2007),
Dec. 5-7, 2007, Hsinchu, Taiwan, NSC: 96-2221-E-027-023. (EI)
48. Reference
17. Shun-Yun Hu, Guan-Ming Liao, “Scalable peer-to-peer networked
virtual environment”, Network and System Support for Games
Proceedings of 3rd ACM SIGCOMM workshop on Network and system
support for games, Pages: 129 – 133, Year of Publication: 2004
18. Matt Welsh, Steven D. Gribble, Eric A. Brewer, David Culler,”A
Design Framework for Highly Concurrent System”, EECS
Department University of California, Berkeley Technical Report No.
UCB/CSD-00-1108 2000.