SlideShare a Scribd company logo
1 of 60
Download to read offline
Yet Another Introduction to
Linux RCU
Viller Hsiao <villerhsiao@gmail.com>
May. 14, 2015
9/3/16 2/60
Who am I ?
Viller Hsiao
Embedded Linux / RTOS engineer
  
http://image.dfdaily.com/2012/5/4/634716931128751250504b050c1_nEO_IMG.jpg
9/3/16 3/60
http://www.anec.com/assets/images/call_before_you_dig.jpg
Presented For HCSM
9/3/16 4/60
What is RCU ?
●
Read-Copy Update
●
A kind of read/write synchronization
mechanism
9/3/16 5/60
Agenda
●
Synchronization inside Linux
●
RCU basic operations
●
Linux RCU internal
9/3/16 6/60
Synchronization Synchronization 
insideinside
Linux KernelLinux Kernel
9/3/16 7/60
R/W Synchronization in SMP System
●
Protect Shared data from concurrent access
●
Synchronization mechanism
●
atomic operation
●
spinlock
●
reader-writer spinlock (rwlock)
●
seqlock
●
RCU
9/3/16 8/60
Atomic Operation
●
Operations that read and change data within a
single, uninterruptible step
●
Architecture support
●
test-and-set (TSR)
●
compare-and-swap (CAS)
●
load-link/store-conditional (ll/sc)
9/3/16 9/60
spinlock
Owner 3 update
Owner 2 read
Owner 1 read
spin
spinsp
in
spin
update
●
Implement by mutual exclusive
u
u
u
u
9/3/16 10/60
rwlock
●
Allow multi reader
●
Mutual exclusive between reader and writer
Reader3
Writer update
read
Reader2 read
Reader1 read
spin
read
read
read
spin
spin
spinsp
in
spinsp
in
sp
in
u
u
u u
u
u
u
9/3/16 11/60
seqlock
●
Consistent mechanism without starving writers.
Reader
Writer Update data
seq = 1 seq = 2
seq = 0 seq = 2 seq = 2
RetryFirst trial
Start with even seq Same seq with start point
9/3/16 12/60
Architecture Support – Atomic Ops
●
Load-link store-conditional
– e.g. ARMv7 ldrex/strex
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0360f/graphics/exclusive_monitor_state_machine2.svg
9/3/16 13/60
Architecture Support – Barrier
●
Optimization in modern computer architecture
●
Optimizing compilers
●
Multi-issuing
●
Out-of-Order Execution
●
Load/Store optimization
●
… etc
CPU 1 CPU 2
====== =======
{ A = 1; B = 2 }
A = 3; x = B;
B = 4; y = A;
CPU 1 CPU 2
====== =======
{ A = 1; B = 2 }
A = 3; x = B;
B = 4; y = A;
9/3/16 14/60
Architecture Support – Barrier (Cont.)
●
Compiler barrier
●
CPU barrier instructions
●
Ensure the order of some operations
●
e.g. dmb/dsb/isb, ldar/stlr
void foo()
{
    A = B + 1;
    asm volatile("" ::: "memory");
    B = 0;
}
void foo()
{
    A = B + 1;
    asm volatile("" ::: "memory");
    B = 0;
}
9/3/16 15/60
The problem
●
Bad in scalability and performance
●
Multiple CPUs to break even with single CPU
http://www.rdrop.com/~paulmck/RCU/RCU.2014.05.18a.TU-Dresden.pdf
9/3/16 16/60
RCU Basic OperationRCU Basic Operation
9/3/16 17/60
RCU Operations – Read
rcu_read_lock();
p = rcu_dereference(gp); /* p = gp */
if (p != NULL) {
c do_something(p->a, p->b);
}
rcu_read_unlock();
rcu_read_lock();
p = rcu_dereference(gp); /* p = gp */
if (p != NULL) {
c do_something(p->a, p->b);
}
rcu_read_unlock();
Read side
Critical section
●
Blocking/preemption within an RCU read-side critical
section is illegal
9/3/16 18/60
RCU Operations – Update & Reclaim
q = kmalloc(sizeof(*q), GFP_KERNEL);
q->a = 1;
q->b = 2;
rcu_assign_pointer(gp, q); /* gp = q */
synchronize_rcu(); /* call_rcu (&callbacks()) */
kfree(p);
q = kmalloc(sizeof(*q), GFP_KERNEL);
q->a = 1;
q->b = 2;
rcu_assign_pointer(gp, q); /* gp = q */
synchronize_rcu(); /* call_rcu (&callbacks()) */
kfree(p);
Removal
(Updater)
Reclaimer
●
Maintain multiple version of recently updated object
●
Spinlock is acquired if multiple udpater
9/3/16 19/60
RCU Primitives
READER
UPDATER RECLAIMER
rcu_dereference()
rcu_assign_pointer()
rcu_read_lock()
rcu_read_unlock()
call_rcu()
synchronize_rcu()
wmb
rmb only on
DEC alpha
preempt­disable
only if
preemptible kernel
Re-painted from [13]
9/3/16 20/60
Quiz: Why does it improve scalability in
read side?
9/3/16 21/60
Why RCU is better?
●
Almost nothing in read side lock (non preempt
kernel)
static inline void rcu_read_lock(void)
{
__asm__ __volatile__("": : :"memory");
(void) 0;
do { } while (0);
do { } while (0);
}
static inline void rcu_read_lock(void)
{
__asm__ __volatile__("": : :"memory");
(void) 0;
do { } while (0);
do { } while (0);
}
Real content of rcu_read_lock() after preprocessor. (! PREEMPT)
9/3/16 22/60
Read side Lock Overhead Comparison
http://lwn.net/images/ns/kernel/rcu/rwlockRCUperf.jpg
9/3/16 23/60
What's the benifit?
●
Zero-overhead and wait-free in read side
●
No memory barrier is required
●
No lock is required
●
Allow recursive lock
●
No deadlock between readers and writer
9/3/16 24/60
RCU List APIs [10]
Operations list
Circular doubly linked list
hlist
Linear doubly linked list
Initialization INIT_LIST_HEAD_RCU()
Full traversal list_for_each_entry_rcu() hlist_for_each_entry_rcu()
hlist_for_each_entry_rcu_bh()
hlist_for_each_entry_rcu_notrace()
Resume traversal list_for_each_entry_continue_rcu() hlist_for_each_entry_continue_rcu()
hlist_for_each_entry_continue_rcu_bh()
Stepwise traversal list_entry_rcu()
list_first_or_null_rcu()
list_next_rcu()
list_first_rcu()
hlist_next_rcu()
hlist_pprev_rcu()
Add list_add_rcu()
list_add_tail_rcu()
hlist_add_after_rcu()
hlist_add_before_rcu()
hlist_add_head_rcu()
Delete list_del_rcu() hlist_del_rcu()
hlist_del_init_rcu()
Replacement list_replace_rcu() hlist_replace_rcu()
Splice list_splice_init_rcu()
9/3/16 25/60
RCU Model
Removal ReclamationGrace Period
Reader
Reader
Reader
Reader
Reader
Reader Reader
Reader Reader
Repainted from https://lwn.net/images/ns/kernel/rcu/GracePeriodGood.png
9/3/16 26/60
RCU vs rwlock
●
RCU has lower overhead and better scalability
●
RCU readers see updated data faster
●
rwlock readers get the consistent data after writer updated
c
https://lwn.net/Articles/263130/
9/3/16 27/60
Replace rwlock by RCU[13]
http://en.wikipedia.org/wiki/Read-copy-update
9/3/16 28/60
Replace rwlock by RCU[13]
http://en.wikipedia.org/wiki/Read-copy-update
9/3/16 29/60
What is RCU, again
●
Read-Copy Update
●
A kind of read-write synchronization mechanism
●
A publish-subscribe mechanism[5]
●
A poor man's garbage collector[5]
9/3/16 30/60
But
Quiz: How does reclaimer know the time
to release old object?
9/3/16 31/60
Linux RCU InternalLinux RCU Internal
9/3/16 32/60
History and Contributors[9][13]
●
1980 H. T. Kung and Q. Lehman 
●
use of garbage collectors to defer destruction of nodes in a parellel binary search tree.
●
1986, Hennessy, Osisek, and Seigh
●
Passive serialization, which is an RCU­like mechanism that relies on the presence of "quiescent states" in 
the VM/XA hypervisor 
●
1995 J. Slingwine and P. E. McKenney
●
US Patent 5,442,758, implement RCU in DYNIX/ptx kernel.
●
2002, D. Sarma
●
added RCU to version 2.5.43 of the Linux kernel
●
2005, P. E. McKenney
●
Permitting preemption of RCU realtime critical sections
●
2009, P. E. McKenny 
●
Introduce user­level RCU implementation
●
Work of P. E. McKenney, Mathieu Desnoyers, Alan Stern, Michel Dagenais, Manish Gupta, Maged 
Michael, Phil Howard, Joshua Triplett, Jonathan Walpole, and the Linux kernel community
9/3/16 33/60
The Problem
●
How can we know when it's safe to reclaim
memory without paying too high a cost?
●
especially in the read path
●
Possible implementation
– Reference count
– Hazard pointer
~ The page is extracted and tweaked from [14]
9/3/16 34/60
Lock-based Synchronization Model
Reader nReader 1
Update nUpdater 1
Reader 1
Reader 1
Reader n
Reader n
<lock icon url>
Obj 1 Obj n
9/3/16 35/60
RCU Synchronization Model
RCU Core
Reader 2 Reader nReader 1
Reclaimer 2 Reclaimer nReclaimer 1
Update 2 Update nUpdater 1
Reader 1
Reader 1
Reader 2
Reader 2
Reader n
Reader n
9/3/16 36/60
Terms
●
Recall that constraint of read side critical
section operations
●
Non-blocked inside read lock (!PREEMPT)
●
Non-preempted (PREEMPT)
●
Irq disable, bh disable imply read side critical
section
9/3/16 37/60
Terms – Grace Period
Removal ReclamationGrace Period
Reader
Reader
Reader
Reader
Reader
Reader Reader
Reader Reader
Repainted from https://lwn.net/images/ns/kernel/rcu/GracePeriodGood.png
9/3/16 38/60
Terms – Quiescent State
Reader Reader Reader
Quiescent State
●
Period outside the read critical section
●
It implies complete of one grace period in its CPU
9/3/16 39/60
Toy RCU Implementation
#define rcu_assign_pointer(p, v) 
({ 
        smp_wmb(); 
        (p) = (v); 
})
void synchronize_rcu(void)
{
        int cpu;
        for_each_online_cpu(cpu)
                run_on(cpu);
}
#define rcu_assign_pointer(p, v) 
({ 
        smp_wmb(); 
        (p) = (v); 
})
void synchronize_rcu(void)
{
        int cpu;
        for_each_online_cpu(cpu)
                run_on(cpu);
}
#define rcu_read_lock()
#define rcu_read_unlock()
#define rcu_dereference(p) 
({ 
        typeof(p) _p1 = (*(volatile typeof(p)*)&(p)); 
        smp_read_barrier_depends(); 
        _p1; 
})
#define rcu_read_lock()
#define rcu_read_unlock()
#define rcu_dereference(p) 
({ 
        typeof(p) _p1 = (*(volatile typeof(p)*)&(p)); 
        smp_read_barrier_depends(); 
        _p1; 
})
Read
Update
9/3/16 40/60
RCU Core State
CPU 0: call_rcu(cb)
RCU State
list 0 cb cb cb
list 1 cb cb cb
list n cb cb cb
Quiescent State Recorder
CPU 0 CPU 1 CPU n
9/3/16 41/60
Quiescent State
●
Condition of quiescent state
●
Context switch
●
Dynticks or idle
●
User mode execution
●
Check RCU state and execute RCU operations
in system background
9/3/16 42/60
RCU Implementation – Classical RCU
●
a.k.a tiny RCU
●
Single data structure to record Quiescent State
●
Scalability is not good for large numbers of CPUs,
e.g. 4096 CPUs
http://lwn.net/Articles/305782/
9/3/16 43/60
RCU Implementation – Hirarchical RCU
●
a.k.a tree RCU
●
Towards a more scalable RCU implementation
●
Default solution in Linux kernel
http://lwn.net/Articles/305782/
9/3/16 44/60
Tree RCU Core – List Operations
CPU x
call_rcu(cb)
cb1 cb2 cbxnxtlist cb0
DONE
TAIL
WAIT
TAIL
NEXT READY
TAIL
NEXT
TAIL
cb
Next
Complete
(DONE)
Next
Complete
(WAIT)
Next
Complete
(NXTRDY)
Next
complete
CPUx
RCU Data
RCU State /
RCU Node gpnum complete
gpnum complete
gpnum
complete
9/3/16 45/60
Tree RCU Core – System Components
invoke_rcu_core()
rcu_gp_kthread_invoke()
Put callback
into list
Updater
call_rcu()
tick_handle_periodic
rcu_check_callback()
RCU SOFTIRQ
rcu_process_callbacks()
rcu_gp_kthread
Process GP
Call callback
rcu_do_batch()
Pass QSs
rcu_bh_qs()
rcu_sched_qs()
invoke_rcu_core()
9/3/16 46/60
Tree RCU Core
http://lwn.net/images/ns/kernel/brcu/RCUbweBlock.png
9/3/16 47/60
RCU state: rcu-sched vs rcu-bh
●
What the #$I#@(&!!! is RCU-bh For???
●
Ran a DDoS workload that hung the system
– Load was so heavy that system never left irq!!!
●
No context switches, no quiescent states, no grace periods
– Eventually, OOM!!!
●
Dipankar created RCU-bh
●
Additional quiescent state in softirq execution
●
Routing cache converted to RCU-bh, then withstood DDoS”
~ The page is extracted from [8]
9/3/16 48/60
Condition of Quiescent State
●
rcu_sched
●
Context switch
●
Dynticks or idle
●
User mode execution
●
rcu_bh
●
Any code outside of softirq with interrupt enabled
9/3/16 49/60
Condition of Quiescent State
●
When to check it?
●
Scheduler
●
__do_softirq()
●
Scheduler clock interrupt handler
– rcu_check_callbacks()
9/3/16 50/60
RCU Stall[16]
●
Possiblility of memory leak if it takes a long grace period
●
Force Quiescent state
●
Part of conditions of which RCU stall happened
●
Documentation/RCU/stallwarn.txt
●
A CPU looping in an RCU read-side critical section.
●
A CPU looping with interrupts disabled. This condition can result in RCU-
sched and RCU-bh stalls.
●
A CPU looping with preemption disabled. This condition can result in RCU-
sched stalls and, if ksoftirqd is in use, RCU-bh stalls.
●
A CPU looping with bottom halves disabled. This condition can result in
RCU-sched and RCU-bh stalls.
9/3/16 51/60
Topic – Sleepable RCU[2]
●
Blocking or sleeping of any sort is strictly prohibited
in classical RCU. This has frequently been an obstacle
to the use of RCU
●
Implement the sleepable RCU (SRCU) that permits
arbitrary sleeping (or blocking) within RCU read-side
critical sections.
9/3/16 52/60
Topic – Userspace RCU[7]
●
Use cases
●
LTTng
●
Atomic operation API utilities
●
Barrier
●
URCU protected hash
●
URCU stack/queue API
9/3/16 53/60
Other Topics
●
Dynticks
●
When some CPU is sleeping in dynticks mode
– Waking up CPU for quiescent state consumes power
– Extened its quiescent state
●
Use RCU in kernel module
●
CPU hotplugs
●
nocb
●
realtime
●
RCU priority boost
9/3/16 54/60
RCU Uses in Linux Kernel
http://www2.rdrop.com/~paulmck/RCU/linuxusage.html
9/3/16 55/60
What is RCU's Area of Applicability?
●
Choose the suitable mechanism for your
application
https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/RCUAreaApp.html
9/3/16 56/60
Q & A
9/3/16 57/60
Reference
[1] McKenney, Paul E., “Introduction to RCU”
[2] McKenney Paul E. (Oct. 2006), “Sleepable RCU”, LWN
[3] McKenney Paul E. (Feb. 2007), “Priority-Boosting RCU Read-Side Critical Sections ”, LWN
[4] McKenney, Paul E.; Walpole, Jonathan (Dec. 2007), “What is RCU, Fundamentally?”, LWN.
[5] McKenney Paul E. (Dec. 2007), “What is RCU? Part 2: Usage”, LWN.
[6] McKenney Paul E. (Dec. 2008), “Hierarchical RCU”, LWN.
[7] McKenney Paul E. (Nov. 2013), “User-space RCU”, LWN
[8] McKenney, Paul E. (Sep. 2009), “RCU and Breakage ”, presented to Netconf 2009
[9] McKenney, Paul E. (May 2014), “What Is RCU? ”, presented to TU Dresden Distributed OS class
[10] Jake (Sep. 2014), "The RCU API tables", LWN.
[11] Wiki: “Load-link/store-conditional”
[12] Wiki: “Memory Barrier”
[13] Wiki: “Read-Copy Update”
9/3/16 58/60
Reference (Cont.)
[12] 杨燚 , (Jul. 2005), “ Linux 2.6内核中新的锁机制--RCU“ , IBM Developer Work
[13] Leiflindholm, (Mar. 2011), “Memory access ordering - an introduction”, ARM Connected
Community
[14] Walpole, Jonathan (2014), “CS510 Concurrent Systems: What is RCU, Fundamentally?”
[15] “What is RCU's Area of Applicability?”
[16] All Linux kernel documentations under Documentation/RCU/
9/3/16 59/60
●
ARM are trademarks or registered trademarks of ARM Holdings.
●
DYNIX (short for DYNamic unIX) is an operating system developed by Sequent Computer
Systems.
●
Linux is a registered trademark of Linus Torvalds.
●
The RCU, spinlock, seqlock are the joint work of its maintainers and the Linux kernel
community.
●
HCSM is the community of Hsinchu Coders in Taiwan.
●
Other company, product, and service names may be trademarks or service marks
of others.
●
The license of each graph belongs to each website listed individually.
●
The others of my work in the slide is licensed under a CC-BY-SA License.
●
License text: http://creativecommons.org/licenses/by-sa/4.0/legalcode
Rights to Copy
copyright © 2015 Viller Hsiao
9/3/16 Viller Hsiao
THE END

More Related Content

What's hot

Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
Ni Zo-Ma
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 
reference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysisreference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysis
Buland Singh
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
hugo lu
 

What's hot (20)

Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Ixgbe internals
Ixgbe internalsIxgbe internals
Ixgbe internals
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Linux Preempt-RT Internals
Linux Preempt-RT InternalsLinux Preempt-RT Internals
Linux Preempt-RT Internals
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Kdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysisKdump and the kernel crash dump analysis
Kdump and the kernel crash dump analysis
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Network Drivers
Network DriversNetwork Drivers
Network Drivers
 
reference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysisreference_guide_Kernel_Crash_Dump_Analysis
reference_guide_Kernel_Crash_Dump_Analysis
 
Embedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernelEmbedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernel
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control Interface
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel FernandesKernel Recipes 2019 - RCU in 2019 - Joel Fernandes
Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 
Introduction to Linux Kernel by Quontra Solutions
Introduction to Linux Kernel by Quontra SolutionsIntroduction to Linux Kernel by Quontra Solutions
Introduction to Linux Kernel by Quontra Solutions
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 

Similar to Yet another introduction to Linux RCU

pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
Wei Shan Ang
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Jiannan Ouyang, PhD
 
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Andrey Vagin
 
Checkpoint and Restore In Userspace
Checkpoint and Restore In UserspaceCheckpoint and Restore In Userspace
Checkpoint and Restore In Userspace
OpenVZ
 

Similar to Yet another introduction to Linux RCU (20)

Userspace RCU library : what linear multiprocessor scalability means for your...
Userspace RCU library : what linear multiprocessor scalability means for your...Userspace RCU library : what linear multiprocessor scalability means for your...
Userspace RCU library : what linear multiprocessor scalability means for your...
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016
 
Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux Synchronization Mechanism: RCU (Read Copy Update)Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux Synchronization Mechanism: RCU (Read Copy Update)
 
RCU
RCURCU
RCU
 
Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
Thread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCUThread synchronization in GlusterD using URCU
Thread synchronization in GlusterD using URCU
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Eco-friendly Linux kernel development
Eco-friendly Linux kernel developmentEco-friendly Linux kernel development
Eco-friendly Linux kernel development
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
Swapping Pacemaker Corosync with repmgr
Swapping Pacemaker Corosync with repmgrSwapping Pacemaker Corosync with repmgr
Swapping Pacemaker Corosync with repmgr
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !
 
OpenHPC: Community Building Blocks for HPC Systems
OpenHPC: Community Building Blocks for HPC SystemsOpenHPC: Community Building Blocks for HPC Systems
OpenHPC: Community Building Blocks for HPC Systems
 
Presentation 14 09_2012
Presentation 14 09_2012Presentation 14 09_2012
Presentation 14 09_2012
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
 
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
 
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
Андрей Вагин. Все что вы хотели знать о Criu, но стеснялись спросить...
 
Checkpoint and Restore In Userspace
Checkpoint and Restore In UserspaceCheckpoint and Restore In Userspace
Checkpoint and Restore In Userspace
 
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
Open WG Talk #2 Everything you wanted to know about CRIU (but were afraid to ...
 

More from Viller Hsiao (9)

Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bcc
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrency
 
twlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsotwlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdso
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
 
mbed-os 3.0 modules dependency graph
mbed-os 3.0 modules dependency graphmbed-os 3.0 modules dependency graph
mbed-os 3.0 modules dependency graph
 
Introduction to ARM mbed-OS 3.0 uvisor
Introduction to ARM mbed-OS 3.0 uvisorIntroduction to ARM mbed-OS 3.0 uvisor
Introduction to ARM mbed-OS 3.0 uvisor
 
My first-crawler-in-python
My first-crawler-in-pythonMy first-crawler-in-python
My first-crawler-in-python
 
Trace kernel code tips
Trace kernel code tipsTrace kernel code tips
Trace kernel code tips
 
f9-microkernel-ktimer
f9-microkernel-ktimerf9-microkernel-ktimer
f9-microkernel-ktimer
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 

Yet another introduction to Linux RCU

  • 1. Yet Another Introduction to Linux RCU Viller Hsiao <villerhsiao@gmail.com> May. 14, 2015
  • 2. 9/3/16 2/60 Who am I ? Viller Hsiao Embedded Linux / RTOS engineer    http://image.dfdaily.com/2012/5/4/634716931128751250504b050c1_nEO_IMG.jpg
  • 4. 9/3/16 4/60 What is RCU ? ● Read-Copy Update ● A kind of read/write synchronization mechanism
  • 5. 9/3/16 5/60 Agenda ● Synchronization inside Linux ● RCU basic operations ● Linux RCU internal
  • 7. 9/3/16 7/60 R/W Synchronization in SMP System ● Protect Shared data from concurrent access ● Synchronization mechanism ● atomic operation ● spinlock ● reader-writer spinlock (rwlock) ● seqlock ● RCU
  • 8. 9/3/16 8/60 Atomic Operation ● Operations that read and change data within a single, uninterruptible step ● Architecture support ● test-and-set (TSR) ● compare-and-swap (CAS) ● load-link/store-conditional (ll/sc)
  • 9. 9/3/16 9/60 spinlock Owner 3 update Owner 2 read Owner 1 read spin spinsp in spin update ● Implement by mutual exclusive u u u u
  • 10. 9/3/16 10/60 rwlock ● Allow multi reader ● Mutual exclusive between reader and writer Reader3 Writer update read Reader2 read Reader1 read spin read read read spin spin spinsp in spinsp in sp in u u u u u u u
  • 11. 9/3/16 11/60 seqlock ● Consistent mechanism without starving writers. Reader Writer Update data seq = 1 seq = 2 seq = 0 seq = 2 seq = 2 RetryFirst trial Start with even seq Same seq with start point
  • 12. 9/3/16 12/60 Architecture Support – Atomic Ops ● Load-link store-conditional – e.g. ARMv7 ldrex/strex http://infocenter.arm.com/help/topic/com.arm.doc.ddi0360f/graphics/exclusive_monitor_state_machine2.svg
  • 13. 9/3/16 13/60 Architecture Support – Barrier ● Optimization in modern computer architecture ● Optimizing compilers ● Multi-issuing ● Out-of-Order Execution ● Load/Store optimization ● … etc CPU 1 CPU 2 ====== ======= { A = 1; B = 2 } A = 3; x = B; B = 4; y = A; CPU 1 CPU 2 ====== ======= { A = 1; B = 2 } A = 3; x = B; B = 4; y = A;
  • 14. 9/3/16 14/60 Architecture Support – Barrier (Cont.) ● Compiler barrier ● CPU barrier instructions ● Ensure the order of some operations ● e.g. dmb/dsb/isb, ldar/stlr void foo() {     A = B + 1;     asm volatile("" ::: "memory");     B = 0; } void foo() {     A = B + 1;     asm volatile("" ::: "memory");     B = 0; }
  • 15. 9/3/16 15/60 The problem ● Bad in scalability and performance ● Multiple CPUs to break even with single CPU http://www.rdrop.com/~paulmck/RCU/RCU.2014.05.18a.TU-Dresden.pdf
  • 17. 9/3/16 17/60 RCU Operations – Read rcu_read_lock(); p = rcu_dereference(gp); /* p = gp */ if (p != NULL) { c do_something(p->a, p->b); } rcu_read_unlock(); rcu_read_lock(); p = rcu_dereference(gp); /* p = gp */ if (p != NULL) { c do_something(p->a, p->b); } rcu_read_unlock(); Read side Critical section ● Blocking/preemption within an RCU read-side critical section is illegal
  • 18. 9/3/16 18/60 RCU Operations – Update & Reclaim q = kmalloc(sizeof(*q), GFP_KERNEL); q->a = 1; q->b = 2; rcu_assign_pointer(gp, q); /* gp = q */ synchronize_rcu(); /* call_rcu (&callbacks()) */ kfree(p); q = kmalloc(sizeof(*q), GFP_KERNEL); q->a = 1; q->b = 2; rcu_assign_pointer(gp, q); /* gp = q */ synchronize_rcu(); /* call_rcu (&callbacks()) */ kfree(p); Removal (Updater) Reclaimer ● Maintain multiple version of recently updated object ● Spinlock is acquired if multiple udpater
  • 19. 9/3/16 19/60 RCU Primitives READER UPDATER RECLAIMER rcu_dereference() rcu_assign_pointer() rcu_read_lock() rcu_read_unlock() call_rcu() synchronize_rcu() wmb rmb only on DEC alpha preempt­disable only if preemptible kernel Re-painted from [13]
  • 20. 9/3/16 20/60 Quiz: Why does it improve scalability in read side?
  • 21. 9/3/16 21/60 Why RCU is better? ● Almost nothing in read side lock (non preempt kernel) static inline void rcu_read_lock(void) { __asm__ __volatile__("": : :"memory"); (void) 0; do { } while (0); do { } while (0); } static inline void rcu_read_lock(void) { __asm__ __volatile__("": : :"memory"); (void) 0; do { } while (0); do { } while (0); } Real content of rcu_read_lock() after preprocessor. (! PREEMPT)
  • 22. 9/3/16 22/60 Read side Lock Overhead Comparison http://lwn.net/images/ns/kernel/rcu/rwlockRCUperf.jpg
  • 23. 9/3/16 23/60 What's the benifit? ● Zero-overhead and wait-free in read side ● No memory barrier is required ● No lock is required ● Allow recursive lock ● No deadlock between readers and writer
  • 24. 9/3/16 24/60 RCU List APIs [10] Operations list Circular doubly linked list hlist Linear doubly linked list Initialization INIT_LIST_HEAD_RCU() Full traversal list_for_each_entry_rcu() hlist_for_each_entry_rcu() hlist_for_each_entry_rcu_bh() hlist_for_each_entry_rcu_notrace() Resume traversal list_for_each_entry_continue_rcu() hlist_for_each_entry_continue_rcu() hlist_for_each_entry_continue_rcu_bh() Stepwise traversal list_entry_rcu() list_first_or_null_rcu() list_next_rcu() list_first_rcu() hlist_next_rcu() hlist_pprev_rcu() Add list_add_rcu() list_add_tail_rcu() hlist_add_after_rcu() hlist_add_before_rcu() hlist_add_head_rcu() Delete list_del_rcu() hlist_del_rcu() hlist_del_init_rcu() Replacement list_replace_rcu() hlist_replace_rcu() Splice list_splice_init_rcu()
  • 25. 9/3/16 25/60 RCU Model Removal ReclamationGrace Period Reader Reader Reader Reader Reader Reader Reader Reader Reader Repainted from https://lwn.net/images/ns/kernel/rcu/GracePeriodGood.png
  • 26. 9/3/16 26/60 RCU vs rwlock ● RCU has lower overhead and better scalability ● RCU readers see updated data faster ● rwlock readers get the consistent data after writer updated c https://lwn.net/Articles/263130/
  • 27. 9/3/16 27/60 Replace rwlock by RCU[13] http://en.wikipedia.org/wiki/Read-copy-update
  • 28. 9/3/16 28/60 Replace rwlock by RCU[13] http://en.wikipedia.org/wiki/Read-copy-update
  • 29. 9/3/16 29/60 What is RCU, again ● Read-Copy Update ● A kind of read-write synchronization mechanism ● A publish-subscribe mechanism[5] ● A poor man's garbage collector[5]
  • 30. 9/3/16 30/60 But Quiz: How does reclaimer know the time to release old object?
  • 32. 9/3/16 32/60 History and Contributors[9][13] ● 1980 H. T. Kung and Q. Lehman  ● use of garbage collectors to defer destruction of nodes in a parellel binary search tree. ● 1986, Hennessy, Osisek, and Seigh ● Passive serialization, which is an RCU­like mechanism that relies on the presence of "quiescent states" in  the VM/XA hypervisor  ● 1995 J. Slingwine and P. E. McKenney ● US Patent 5,442,758, implement RCU in DYNIX/ptx kernel. ● 2002, D. Sarma ● added RCU to version 2.5.43 of the Linux kernel ● 2005, P. E. McKenney ● Permitting preemption of RCU realtime critical sections ● 2009, P. E. McKenny  ● Introduce user­level RCU implementation ● Work of P. E. McKenney, Mathieu Desnoyers, Alan Stern, Michel Dagenais, Manish Gupta, Maged  Michael, Phil Howard, Joshua Triplett, Jonathan Walpole, and the Linux kernel community
  • 33. 9/3/16 33/60 The Problem ● How can we know when it's safe to reclaim memory without paying too high a cost? ● especially in the read path ● Possible implementation – Reference count – Hazard pointer ~ The page is extracted and tweaked from [14]
  • 34. 9/3/16 34/60 Lock-based Synchronization Model Reader nReader 1 Update nUpdater 1 Reader 1 Reader 1 Reader n Reader n <lock icon url> Obj 1 Obj n
  • 35. 9/3/16 35/60 RCU Synchronization Model RCU Core Reader 2 Reader nReader 1 Reclaimer 2 Reclaimer nReclaimer 1 Update 2 Update nUpdater 1 Reader 1 Reader 1 Reader 2 Reader 2 Reader n Reader n
  • 36. 9/3/16 36/60 Terms ● Recall that constraint of read side critical section operations ● Non-blocked inside read lock (!PREEMPT) ● Non-preempted (PREEMPT) ● Irq disable, bh disable imply read side critical section
  • 37. 9/3/16 37/60 Terms – Grace Period Removal ReclamationGrace Period Reader Reader Reader Reader Reader Reader Reader Reader Reader Repainted from https://lwn.net/images/ns/kernel/rcu/GracePeriodGood.png
  • 38. 9/3/16 38/60 Terms – Quiescent State Reader Reader Reader Quiescent State ● Period outside the read critical section ● It implies complete of one grace period in its CPU
  • 39. 9/3/16 39/60 Toy RCU Implementation #define rcu_assign_pointer(p, v)  ({          smp_wmb();          (p) = (v);  }) void synchronize_rcu(void) {         int cpu;         for_each_online_cpu(cpu)                 run_on(cpu); } #define rcu_assign_pointer(p, v)  ({          smp_wmb();          (p) = (v);  }) void synchronize_rcu(void) {         int cpu;         for_each_online_cpu(cpu)                 run_on(cpu); } #define rcu_read_lock() #define rcu_read_unlock() #define rcu_dereference(p)  ({          typeof(p) _p1 = (*(volatile typeof(p)*)&(p));          smp_read_barrier_depends();          _p1;  }) #define rcu_read_lock() #define rcu_read_unlock() #define rcu_dereference(p)  ({          typeof(p) _p1 = (*(volatile typeof(p)*)&(p));          smp_read_barrier_depends();          _p1;  }) Read Update
  • 40. 9/3/16 40/60 RCU Core State CPU 0: call_rcu(cb) RCU State list 0 cb cb cb list 1 cb cb cb list n cb cb cb Quiescent State Recorder CPU 0 CPU 1 CPU n
  • 41. 9/3/16 41/60 Quiescent State ● Condition of quiescent state ● Context switch ● Dynticks or idle ● User mode execution ● Check RCU state and execute RCU operations in system background
  • 42. 9/3/16 42/60 RCU Implementation – Classical RCU ● a.k.a tiny RCU ● Single data structure to record Quiescent State ● Scalability is not good for large numbers of CPUs, e.g. 4096 CPUs http://lwn.net/Articles/305782/
  • 43. 9/3/16 43/60 RCU Implementation – Hirarchical RCU ● a.k.a tree RCU ● Towards a more scalable RCU implementation ● Default solution in Linux kernel http://lwn.net/Articles/305782/
  • 44. 9/3/16 44/60 Tree RCU Core – List Operations CPU x call_rcu(cb) cb1 cb2 cbxnxtlist cb0 DONE TAIL WAIT TAIL NEXT READY TAIL NEXT TAIL cb Next Complete (DONE) Next Complete (WAIT) Next Complete (NXTRDY) Next complete CPUx RCU Data RCU State / RCU Node gpnum complete gpnum complete gpnum complete
  • 45. 9/3/16 45/60 Tree RCU Core – System Components invoke_rcu_core() rcu_gp_kthread_invoke() Put callback into list Updater call_rcu() tick_handle_periodic rcu_check_callback() RCU SOFTIRQ rcu_process_callbacks() rcu_gp_kthread Process GP Call callback rcu_do_batch() Pass QSs rcu_bh_qs() rcu_sched_qs() invoke_rcu_core()
  • 46. 9/3/16 46/60 Tree RCU Core http://lwn.net/images/ns/kernel/brcu/RCUbweBlock.png
  • 47. 9/3/16 47/60 RCU state: rcu-sched vs rcu-bh ● What the #$I#@(&!!! is RCU-bh For??? ● Ran a DDoS workload that hung the system – Load was so heavy that system never left irq!!! ● No context switches, no quiescent states, no grace periods – Eventually, OOM!!! ● Dipankar created RCU-bh ● Additional quiescent state in softirq execution ● Routing cache converted to RCU-bh, then withstood DDoS” ~ The page is extracted from [8]
  • 48. 9/3/16 48/60 Condition of Quiescent State ● rcu_sched ● Context switch ● Dynticks or idle ● User mode execution ● rcu_bh ● Any code outside of softirq with interrupt enabled
  • 49. 9/3/16 49/60 Condition of Quiescent State ● When to check it? ● Scheduler ● __do_softirq() ● Scheduler clock interrupt handler – rcu_check_callbacks()
  • 50. 9/3/16 50/60 RCU Stall[16] ● Possiblility of memory leak if it takes a long grace period ● Force Quiescent state ● Part of conditions of which RCU stall happened ● Documentation/RCU/stallwarn.txt ● A CPU looping in an RCU read-side critical section. ● A CPU looping with interrupts disabled. This condition can result in RCU- sched and RCU-bh stalls. ● A CPU looping with preemption disabled. This condition can result in RCU- sched stalls and, if ksoftirqd is in use, RCU-bh stalls. ● A CPU looping with bottom halves disabled. This condition can result in RCU-sched and RCU-bh stalls.
  • 51. 9/3/16 51/60 Topic – Sleepable RCU[2] ● Blocking or sleeping of any sort is strictly prohibited in classical RCU. This has frequently been an obstacle to the use of RCU ● Implement the sleepable RCU (SRCU) that permits arbitrary sleeping (or blocking) within RCU read-side critical sections.
  • 52. 9/3/16 52/60 Topic – Userspace RCU[7] ● Use cases ● LTTng ● Atomic operation API utilities ● Barrier ● URCU protected hash ● URCU stack/queue API
  • 53. 9/3/16 53/60 Other Topics ● Dynticks ● When some CPU is sleeping in dynticks mode – Waking up CPU for quiescent state consumes power – Extened its quiescent state ● Use RCU in kernel module ● CPU hotplugs ● nocb ● realtime ● RCU priority boost
  • 54. 9/3/16 54/60 RCU Uses in Linux Kernel http://www2.rdrop.com/~paulmck/RCU/linuxusage.html
  • 55. 9/3/16 55/60 What is RCU's Area of Applicability? ● Choose the suitable mechanism for your application https://www.kernel.org/pub/linux/kernel/people/paulmck/Answers/RCU/RCUAreaApp.html
  • 57. 9/3/16 57/60 Reference [1] McKenney, Paul E., “Introduction to RCU” [2] McKenney Paul E. (Oct. 2006), “Sleepable RCU”, LWN [3] McKenney Paul E. (Feb. 2007), “Priority-Boosting RCU Read-Side Critical Sections ”, LWN [4] McKenney, Paul E.; Walpole, Jonathan (Dec. 2007), “What is RCU, Fundamentally?”, LWN. [5] McKenney Paul E. (Dec. 2007), “What is RCU? Part 2: Usage”, LWN. [6] McKenney Paul E. (Dec. 2008), “Hierarchical RCU”, LWN. [7] McKenney Paul E. (Nov. 2013), “User-space RCU”, LWN [8] McKenney, Paul E. (Sep. 2009), “RCU and Breakage ”, presented to Netconf 2009 [9] McKenney, Paul E. (May 2014), “What Is RCU? ”, presented to TU Dresden Distributed OS class [10] Jake (Sep. 2014), "The RCU API tables", LWN. [11] Wiki: “Load-link/store-conditional” [12] Wiki: “Memory Barrier” [13] Wiki: “Read-Copy Update”
  • 58. 9/3/16 58/60 Reference (Cont.) [12] 杨燚 , (Jul. 2005), “ Linux 2.6内核中新的锁机制--RCU“ , IBM Developer Work [13] Leiflindholm, (Mar. 2011), “Memory access ordering - an introduction”, ARM Connected Community [14] Walpole, Jonathan (2014), “CS510 Concurrent Systems: What is RCU, Fundamentally?” [15] “What is RCU's Area of Applicability?” [16] All Linux kernel documentations under Documentation/RCU/
  • 59. 9/3/16 59/60 ● ARM are trademarks or registered trademarks of ARM Holdings. ● DYNIX (short for DYNamic unIX) is an operating system developed by Sequent Computer Systems. ● Linux is a registered trademark of Linus Torvalds. ● The RCU, spinlock, seqlock are the joint work of its maintainers and the Linux kernel community. ● HCSM is the community of Hsinchu Coders in Taiwan. ● Other company, product, and service names may be trademarks or service marks of others. ● The license of each graph belongs to each website listed individually. ● The others of my work in the slide is licensed under a CC-BY-SA License. ● License text: http://creativecommons.org/licenses/by-sa/4.0/legalcode Rights to Copy copyright © 2015 Viller Hsiao